Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

S16: Causal discovery with a view to the life sciences

Time:

Monday, 04/Sept/2023:

4:10pm - 5:50pm

Session Chair: Giusi Moffa Session Chair: Mikko Koivisto

Location:Lecture Room U1.131 hybrid

Presentations

4:10pm - 4:50pm

Causal discovery: Benchmarking algorithms and Bayesian analyses in the life sciences

Jack Kuipers

ETH Zurich, Switzerland

Probabilistic and causal graphical models are powerful tools to help understand and characterise complex mechanisms. Given a causal diagram, there are ways to estimate the effect of one variable on another from data. Causal discovery aims to find plausible causal relationships without prior knowledge of the graphical structure, a broad and fast-moving field. Selecting the most appropriate method from a plethora of available algorithms can be a daunting task. After surveying well-established approaches to structure learning, we present a workflow for large-scale benchmarking in a systematic and reproducible manner with the Benchpress platform. Its strengths include being fully modular and easily extendable to include new algorithms. In the life sciences, graphical models and structure learning may aid communication in complex scenarios and facilitate decision-making. For illustration, we discuss two case studies focusing on sampling-based methods, as they enable fully Bayesian analyses. These naturally characterise the uncertainty in the estimates of both network structures and their parameters. The first case study deals with uncovering patterns of genetic mutations in large-scale oncology datasets. As generative models, Bayesian networks offer a framework for model-based clustering. Integrating clinical covariates with the genomic profiles in the causal graphical model may provide more informative patient stratifications for developing personalised therapeutics. The second case study comes from psychiatric epidemiology, where we wish to estimate putative intervention effects from psychological survey data to better inform study design for interventional trials.

4:50pm - 5:10pm

Consistent and efficient mixed integer programming for causal discovery

Ali Shojaie

University of Washington, United States of America

Learning the structure of directed acyclic graphs (DAGs), known as causal discovery, is computationally and statistically challenging. We cast the problem as a mixed-integer program with an objective function composed of a convex quadratic loss function and a regularization penalty subject to linear constraints. The optimal solution to this mathematical program is known to have desirable statistical properties under certain conditions. However, the state-of-the-art optimization solvers are not able to obtain provably optimal solutions to the existing mathematical formulations for medium-size problems within reasonable computational time. To address this difficulty, we tackle the problem from both computational and statistical perspectives. Computationally, we propose an efficient mixed-integer quadratic optimization (MIQO) model, the layered network formulation. In addition to offering improvements compared with the existing approaches, the new formulation can also take advantage of easily obtainable super structures, such as the moral graph, to reduce the number of possible DAGs. Statistically, we propose an early stopping criterion to terminate the branch-and-bound process in order to obtain a near-optimal solution to the mixed-integer program, and establish the consistency of this approximate solution.

5:10pm - 5:30pm

Causality-inspired ML: what can causality do for ML?

Sara Magliacane

University of Amsterdam, Netherlands, The

Applying machine learning to real-world cases often requires methods that are robust w.r.t. heterogeneity, missing not at random or corrupt data, selection bias, non i.i.d. data etc. and that can generalize across different domains. Moreover, many tasks are inherently trying to answer causal questions and gather actionable insights, a task for which correlations are usually not enough. Several of these issues are addressed in the rich causal inference literature. On the other hand, often classical causal inference methods require either a complete knowledge of a causal graph or enough experimental data (interventions) to estimate it accurately. Recently, a new line of research has focused on causality-inspired machine learning, i.e. on the application ideas from causal inference to machine learning methods without necessarily knowing or even trying to estimate the complete causal graph.

In this talk, I will present an example of this line of research in the unsupervised domain adaptation case, in which we have labelled data in a set of source domains and unlabelled data in a target domain ("zero-shot"), for which we want to predict the labels. In particular, given certain assumptions, our approach is able to select a set of provably "stable" features (a separating set), for which the generalization error can be bound, even in case of arbitrarily large distribution shifts. As opposed to other works, it also exploits the information in the unlabelled target data, allowing for some unseen shifts w.r.t. to the source domains. While using ideas from causal inference, our method never aims at reconstructing the causal graph or even the Markov equivalence class, showing that causal inference ideas can help machine learning even in this more relaxed setting.

5:30pm - 5:50pm

Bayesian inference of causal graphs: where we are and where we should go

Mikko Koivisto

University of Helsinki, Finland

Causal discovery aims at inferring causeâ€“effect relationships between variables from observational data. Recently, there has been notable progress in Bayesian inference of causal graphs, which holds the promise of fully quantifying the uncertainty over competitive causal hypotheses. In this talk, we will highlight the power of the Bayesian paradigm for modeling and inference when the models of interest are only partially identifiable from data. On the other hand, we will also critically examine the assumptions currently needed for statistically and computationally efficient Bayesian inference, including the assumption that we have measured all the common causes of the measured variables. Finally, we will ask how causal discovery in life sciences differs from that in physical sciences or in social sciences.