LPCMCI: Causal Discovery in Time Series with Latent Confounders

<p>The quest to understand cause and effect relationships is at the basis of the scientific enterprise. In cases where the classical approach of controlled experimentation is not feasible, methods from the modern framework of causal discovery provide an alternative way to learn about cause and effect from observational, i.e., non-experimental data. Recent years have seen an increasing interest in these methods from various scientific fields, for example in the climate and Earth system sciences (where large scale experimentation is often infeasible) as well as machine learning and artificial intelligence (where models based on an understanding of cause and effect promise to be more robust under changing conditions.)</p><p>In this contribution we present the novel LPCMCI algorithm for learning the cause and effect relationships in multivariate time series. The algorithm is specifically adapted to several challenges that are prevalent in time series considered in the climate and Earth system sciences, for example strong autocorrelations, combinations of time lagged and contemporaneous causal relationships, as well as nonlinearities. It moreover allows for the existence of latent confounders, i.e., it allows for unobserved common causes. While this complication is faced in most realistic scenarios, especially when investigating a system as complex as Earth's climate system, it is nevertheless assumed away in many existing algorithms. We demonstrate applications of LPCMCI to examples from a climate context and compare its performance to competing methods.</p><p>Related reference:<br>Gerhardus, Andreas and Runge, Jakob (2020). High-recall causal discovery for autocorrelated time series with latent confounders. In Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020).&#160;</p>

The quest to understand cause and effect relationships is at the basis of the scientific enterprise. In cases where the classical approach of controlled experimentation is not feasible, methods from the modern framework of causal discovery provide an alternative way to learn about cause and effect from observational, i.e., non-experimental data. Recent years have seen an increasing interest in these methods from various scientific fields, for example in the climate and Earth system sciences (where large scale experimentation is often infeasible) as well as machine learning and artificial intelligence (where models based on an understanding of cause and effect promise to be more robust under changing conditions.) In this contribution we present the novel LPCMCI algorithm for learning the cause and effect relationships in multivariate time series. The algorithm is specifically adapted to several challenges that are prevalent in time series considered in the climate and Earth system sciences, for example strong autocorrelations, combinations of time lagged and contemporaneous causal relationships, as well as nonlinearities. It moreover allows for the existence of latent confounders, i.e., it allows for unobserved common causes. While this complication is faced in most realistic scenarios, especially when investigating a system as complex as Earth's climate system, it is nevertheless assumed away in many existing algorithms. We demonstrate applications of LPCMCI to examples from a climate context and compare its performance to competing methods.
(1) Problem setting Learn causal structure of multivariate time series in form of a causal graph from observational data … … while allowing for arbitrary non-linear functional relationships and latent (hidden) confounders.

LPCMCI: Causal Discovery in Time Series with Latent Confounders Andreas Gerhardus & Jakob Runge Institute of Data Science, German Aerospace Center (DLR) (2) Our contribution
Our novel LPCMCI algorithm(*) strongly outperforms competing methods in terms of detection power [1].

(4) Real data applications
An application to a river discharge dataset is demonstrated in our paper [1], more applications are subject of future work.

Introduction
Motivation: Complex dynamics of the climate system System of interest: Goal: Contribute to a better understanding of Earth's complex weather and climate system. • Defines notions of cause and effect in a mathematical framework.
• Casts causal questions within this framework.
• Specifies assumptions and develops methods for answering these questions.
Important sub-field: Causal discovery • Specifies assumptions and develops methods for learning cause and effect relationships from observational data. Textbooks: [Pearl, 2000, Spirtes et al., 2000, Peters et al., 2017. 4 Why is causal knowledge important?

On the notion of causation
Scientific understanding: Knowledge of cause and effect relationships is an essential part of the physical understanding of natural processes.

Robust prediction & forecasting:
Predictive systems consistent with the underlying causal structures are thought to be more robust under changing environmental conditions.

Evaluation the effect of actions:
Questions of the type What will happen if do ...? are of causal nature.

Attribution:
Questions of the type Why did this event happen? are of causal nature.

5
Learning causal relationships from statistical independencies Independence-based causal discovery: Learn causal graph from statistical tests of (conditional) independencies* in observational data ⇒ CI-based causal discovery See [Pearl, 2000, Spirtes et al., 2000, Peters et al., 2017 for more details.

6
Causal graphs and (conditional) independencies Fact: The structure of the causal graph often has observable implications in terms of (conditional) independencies in the observed data. Intuition: • Statistical dependencies derive from causal relationships Causal graphs and (conditional) independencies Example: General rule: d-separation Graphical criterion to read off all (conditional) independencies implied by the structure of a given causal graph [Pearl, 1985, Pearl, 1988].
Assumption of no accidental independencies: Particularities: • Variables are resolved in time Particularities: • Variables are resolved in time

• Autocorrelation
Additional assumption: • Stationary causal structure Particularities: • Variables are resolved in time

12
LPCMCI achieves strong gains in recall

Results of numerical experiments:
For autocorrelated continuous data LPCMCI shows strong gains in recall as compared to the current state of the art algorithm* *the SVAR-FCI algorithm by [Malinsky and Spirtes, 2018]