Causal Discovery for Climate Time Series in the Presence of Unobserved Variables

Scientific inquiry seeks to understand natural phenomena by understanding their underlying processes, i.e., by identifying cause and effect. In addition to mere scientific curiosity, an understanding of cause and effect relationships is necessary to predict the effect of changing dynamical regimes and for the attribution of extreme events to potential causes. It is thus an important question to ask how, in cases where controlled experiments are not feasible, causation can still be inferred from the statistical dependencies in observed time series.

A central obstacle for such an inference is the potential existence of unobserved causally relevant variables. Arguably, this is more likely to be the case than not, for example unmeasured deep oceanic variables in atmospheric processes. Unobserved variables can act as confounders (meaning they are a common cause of two or more observed variables) and thus introduce spurious, i.e., non-causal dependencies. Despite these complications, the last three decades have seen the development of so-called causal discovery algorithms (an example being FCI by Spirtes et al., 1999) that are often able to identify spurious associations and to distinguish them from genuine causation. This opens the possibility for a data-driven approach to infer cause and effect relationships among climate variables, thereby contributing to a better understanding of Earth's complex climate system. These methods are, however, not yet well adapted to some specific challenges that climate time series often come with, e.g. strong autocorrelation, time lags and nonlinearities. To close this methodological gap, we generalize the ideas of the recent PCMCI causal discovery algorithm  to time series where unobserved causally relevant variables may exist (in contrast, PCMCI made the assumption of no confounding). Further, we present preliminary applications to modes of climate variability.

Causal Discovery for Climate Time Series in the Presence of Unobserved Variables
Andreas Gerhardus and Jakob Runge German Aerospace Center, Institute of Data Science, Jena EGU2020: Sharing Geoscience Online • Bypassing a major philosophical debate, we adopt the following definition of causality:

X is a cause Y if changing the value of X while keeping all other conditions the same leads to a different value of Y
• The classical method of empirically inferring causal relationships is by experimentation: Set up an experiment that changes the value of X without affecting other variables. If the value of Y changes when X changes, then X is a cause of Y • Example: Turning the light on, the room is illuminated Y is not a cause of X: Illuminating the room by, say, a flashlight does not turn on the light

Causal Relationships and Their Inference by Experimentation
• Causal discovery aims to infer causal relationships from observational data ¹ • Given the above definition of causality, this task comes with the following fundamental challenge: The data is already there, it has been generated without us controlling the experimental conditions.

W X Z Y
• Assumption 1:

The observed data was generated by a process that is expressable as SCM
• Discussion: Equilibrium states of ordinary differential equations and random differential equations can be described by SCMs ¹ ² • Consequence: The structure of the corresponding causal graph implies statistical independencies Example cont.: W conditionally independent of Z given X (causal influence is mediated by X) X and W are marginally independent of Y (colliding arrows at Z block influence)

W X Z Y
• Assumption 2:

All statistical independencies are implied by d-separation on the causal graph ¹
• Discussion: Intuitively, this excludes "accidental" independencies due to fine-tuned parameters Weaker forms of this assumption exist ² • Consequence: Statistical independencies constrain the structure of the causal graph • Constraint-based causal discovery:

Perform tests of statistical (in-)dependence in the observed data to constrain the causal graph as much as possible, thereby inferring causal relationships
Causal Graphs and Statistical Independencies: Part 2 ¹See notions of "minimality" in Pearl, J. Causality: Models, Reasoning, and Inference, and "faithfulness" in Spirtes, P., Glymour, C., and Scheines, R. Causation, Prediction, and Search. ²For example: Ramsey, J., Spirtes, P., and Zhang, J. Adjacency-Faithfulness and Conservative Causal Inference.
• In practice, we won't observe every single variable that is involved in the physical process under investigation • However, some of the unobserved variables may be causally relevant:

If Z is unobserved, it is called a hidden confounder or a hidden common cause
• This complicates the inference of causal relationships for the following reason: Say we observe a statistical dependence between X and Y, and this dependence cannot be blocked off by conditioning on some other observed variables If there are no hidden confounders, we can conclude that X causes Y or vice versa If there are hidden confounders, we cannot draw this conclusion