Invited Speakers

  • Wray Buntine (Monash University, Australia)
  • Kun Zhang (Carnegie Melon University, USA)
  • Marco Scutari (University of Oxford, UK)
  • Taisuke Sato (NII/AIST-AIRC*, Japan)
  • John T. Halloran (UC Davis, USA)
  • Tomi Silander (NAVER LABS Europe)
* National Institute of Advanced Industrial Science and Technology - Artificial Intelligence Research Institute, Emeritus, Tokyo Institute of Technology


Wray Buntine (Monash University, Australia)

Backoff methods for estimating parameters of a Bayesian network

Abstract: Various authors have highlighted inadequacies of BDeu type scores and this problem is shared in parameter estimation. Basically, Laplace estimates work poorly, at least because setting the prior concentration is challenging. In 1997, Freidman et al suggested a simple backoff approach for Bayesian network classifiers (BNCs). Backoff methods dominate in in n-gram language models, with modified Kneser-Ney smoothing, being the best known, and a Bayesian variant exists in the form of Pitman-Yor process language models from Teh in 2006. In this talk we will present some results on using backoff methods for Bayes network classifiers and Bayesian networks generally. For BNCs at least, the improvements are dramatic and alleviate some of the issues of choosing too dense a network.

Biography: Wray Buntine is a full professor at Monash University in February 2014 after 7 years at NICTA in Canberra Australia. At Monash he is director of the Master of Data Science, the Faculty of IT's newest and in-demand degree, and was founding director of the innovative (online) Graduate Diploma of Data Science. He was previously at NICTA (Australia), Helsinki Institute for Information Technology, NASA Ames Research Center, University of California, Berkeley, and Google. He is known for his theoretical and applied work and in probabilistic methods for document and text analysis, social networks, data mining and machine learning. His recent focus has been with non-parametric methods in these areas. He has acted as programme co-chair of ECML-PKDD in 2009 in Bled, Slovenia, programme co-chair of ACML in Singapore in 2012 and programme chair in Canberra in 2013. He reviews for conferences such as ACML, ECIR, ECML-PKDD, ICML, NIPS, UAI, and KDD, and is on the editorial board of Data Mining and Knowledge Discovery.


Kun Zhang (Carnegie Melon University, USA)

Causal Learning and Machine Learning

Abstract: Can we find the causal direction between two variables? How can we make optimal predictions in the presence of distribution shift? We are often faced with such causal modeling or prediction problems. Recently, with the rapid accumulation of huge volumes of data, both causal discovery, i.e., learning causal information from purely observational data, and machine learning are seeing exciting opportunities as well as great challenges. This talk will be focused on recent advances in causal discovery and how causal information facilitates understanding and solving certain problems of learning from heterogeneous data. In particular, I will talk about basic approaches to causal discovery and address practical issues in causal discovery, including nonstationarity or heterogeneity of the data and existence of measurement error. Finally, I will discuss why and how underlying causal knowledge helps in learning from heterogeneous data when the i.i.d. assumption is dropped, with transfer learning? as a particular example.

Biography: Kun Zhang is an assistant professor in the philosophy department and an affiliate faculty member in the machine learning department of Carnegie Mellon University (CMU), USA. Before joining CMU, he was a senior research scientist at Max Planck Institute for Intelligent Systems, Germany, and a lead scientist at Information Sciences Institute of University of Southern California. His research interests lie in machine learning and artificial intelligence, especially in causal discovery and causality-based learning. He has served as a senior program committee member or area chair for a number of conferences in machine learning or artificial intelligence, and organized various academic activities to foster interdisciplinary research in causality.


Marco Scutari (University of Oxford, UK)

Bayesian Dirichlet Bayesian Network Scores and the Maximum Entropy Principle

Abstract: A classic approach for learning Bayesian networks from data is to select the \emph{maximum a posteriori} (MAP) network. In the case of discrete Bayesian networks, the MAP network is selected by maximising one of several possible Bayesian Dirichlet (BD) scores; the most famous is the \emph{Bayesian Dirichlet equivalent uniform} (BDeu) score from Heckerman \emph{et al.} (1995). The key properties of BDeu arise from its underlying uniform prior, which makes structure learning computationally efficient; does not require the elicitation of prior knowledge from experts; and satisfies score equivalence. In this paper we will discuss the impact of this uniform prior on structure learning from an information theoretic perspective, showing how BDeu may violate the maximum entropy principle when applied to sparse data and how it may also be problematic from a Bayesian model selection perspective. On the other hand, the BDs score proposed in Scutari (2016) arises from a piecewise prior and it does not appear to violate the maximum entropy principle, even though it is asymptotically equivalent to BDeu.

Biography: Marco Scutari is a Lecturer in Statistics at the Department of Statistics, University of Oxford. He is the author and maintainer of the bnlearn R package, and of the books "Bayesian Networks in R: with Applications in Systems Biology" (Springer) and "Bayesian Networks: with Examples in R" (CRC). His research focuses on the theory of Bayesian networks, and in particular on structure learning, using both test statistics and network scores; and on computational aspects such as scalability, parallel computing and efficient software implementations. He is a memmber of the PGM programme committee since 2014 and regularly reviews papers on graphical models for the Journal of Statistical Software, JMLR, Statistics & Computing and the Journal of the Royal Statistical Society. His favoured application field, after spending many years at UCL's Genetics Institute, is systems biology and plant and animal genetics. In that context he uses Bayesian networks to analyse sequence and expression data to perform association studies and to implement genomic selection breeding programs.


Taisuke Sato (NII/AIST-AIRC, Japan)

Learning probability by comparison

Abstract: Learning probability by probabilistic modeling is a major task in statistical machine learning and it has traditionally been supported by maximum likelihood estimation applied to generative models or by a local maximizer applied to discriminative models. In this talk, we introduce a third approach, an innovative one that learns probability by comparing probabilistic events. In our approach, we give the ranking of probabilistic events and the system learns a probability distribution so that the ranking is well respected. We implemented this approach in PRISM, a logic-based probabilistic programming language, and conducted learning experiments with real data for models described by PRISM programs.

Biography: Taisuke Sato is an emeritus professor at Tokyo Institute of Technology and an invited senior researcher at AI research center, National Institute of Advanced Industrial Science and Technology (AIST) in Japan. He received his M.S. in Electrical Engineering in 1975 and his Ph.D. in Computer Science 1987 from Tokyo Institute of Technology. His early work includes program transformation and synthesis in logic programming. Then he has been working on the integration of logical reasoning and probabilistic reasoning. In particular he has been developing a logic-based probabilistic programming language PRISM (PRogrammign In Statistical Modeling) that offers a variety of learning methods for generative modeling (MLE, Viterbi training, variational Bayes, MCMC) and discriminative modeling such as CRFs. Recently, he focuses on formalizing and implementing logical inference in vector spaces.


John T. Halloran (UC Davis, USA)

Analyzing Tandem Mass Spectra: A Graphical Models Perspective

Abstract: In the past two decades, the field of proteomics has seen explosive growth, largely due to the development of tandem mass spectrometry (MS/MS). With a complex biological sample as input, a typical MS/MS experiment quickly produces a large (often numbering in the hundreds-of-thousands) collection of spectra representative of the proteins present in the original complex sample. A majority of widely used methods to search and identify MS/MS spectra use scoring functions which rely on static, hand-selected parameters rather than affording the ability to learn parameters and adapt to the widely varying characteristics of MS/MS data. In this talk, we discuss recent work utilizing dynamic Bayesian networks (DBNs) to identify MS/MS spectra. In particular, we discuss a recently proposed DBN for Rapid Identification of Peptides (DRIP) which, in contrast to popular scoring functions, allows efficient generative and discriminative learning of parameters to achieve state-of-the-art spectrum-identification accuracy. Furthermore, facilitated by DRIP's generative nature, we present current innovations leveraging DBNs to significantly enhance many other aspects of MS/MS analysis, such as improving downstream discriminative classification via detailed feature extraction and speeding up identification runtime using trellises and approximate inference.

Biography: John T. Halloran is a postdoctoral researcher at the University of California, Davis. He completed his PhD in electrical engineering at the University of Washington, Seattle, in 2016. At the University of Washington, he worked in the MELODI (MachineE Learning for Optimization and Data Interpretation) and Noble Labs under the joint supervision of Jeff Bilmes and William Noble. He obtained his MS in electrical engineering in 2010 at the University of Hawaii, Manoa, and received a BS in electrical engineering and a BS in math from Seattle University in 2008. His current research interests lie in utilizing machine learning methods to analyze proteomics data, with particular focus on the development of graphical models which may be effectively trained (either generatively or discriminatively). His previous work has involved developing graphical models for cancer genomics, automatic speech recognition, and wireless communications. He is also broadly interested in machine learning applications and problems in computational biology.


Tomi Silander (NAVER LABS Europe)

Hyperparameter sensitivity revisited

Abstract: The BDeu scoring criterion for learning Bayesian network structures is known to be very sensitive to the equivalent sample size hyper-parameter. Recently some authors have suggested alternative Bayesian scoring criteria that appear to behave better than BDeu. So is the problem solved? We will review the problem and suggested solutions and present empirical assessment of the current situation.

Biography: Tomi Silander is a senior research scientist at the NAVER LABS Europe (NLE) in Grenoble, France. Author of B-course, the first Bayesian network learning tool online, he made his PhD at the Helsinki University. He is known for developing exact structure learning methods with which he has studied the problems in the BDeu model selection criterion. He has also worked on developing alternative, information theoretic model selection criteria for structure learning such as the factorized normalized maximum likelihood criterion (fNML) and the quotient normalized maximum likelihood criterion (qNML). Having worked both in academic (University of Helsinki and the National University of Singapore) and industrial (Nokia Research Centre, A-Star Institute of High Performance Computing, and Xerox Research Centre Europe) research organizations, he is active reviewer for many machine learning conferences and journals.