Causal Inference and Digital Causality Lab
University of Hamburg
Notebooks available from the DCL website
Introductory videos on Lecture2Go (link available in STiNE)
Q & A in in-person meetings with Oliver
Topics and Tools
Always keep your head up: Don’t get frutstrated too easily
Course intended for bachelor students (4th or 6th semester)
Course website:
Examination
Digital Causality Lab
Bonus of 0.3 points possible
Collaborative data product development
Final presentation at the end of the semester
Bonus criteria
We must learn to analyze data and assess causal claims — a skill that is increasingly important for business and government leaders.
Source: https://hbr.org/2021/11/leaders-stop-confusing-correlation-with-causation
Example 1.2.1
We record the recovery rates of 700 patients who were given access to the drug. A total of 350 patients chose to take the drug and 350 patients did not. The results of the study are shown in Table 1.1.
Drug | No drug | |
---|---|---|
Men | 81 out of 87 recovered (93%) | 234 out of 270 recovered (87%) |
Women | 192 out of 263 recovered (73%) | 55 out of 80 recovered (69%) |
Combined data | 273 out of 350 recovered (78%) | 289 out of 350 recovered (83%) |
Source: Glymour, Pearl, and Jewell (2016)
Figure 1.1 Results of the exercise–cholesterol study, segregated by age
Source: Glymour, Pearl, and Jewell (2016)
Figure 1.2 Results of the exercise–cholesterol study, unsegregated. The data points are identical to those of Figure 1.1, except the boundaries between the various age groups are not shown
Source: Glymour, Pearl, and Jewell (2016)
App: https://simpsons-paradox.herokuapp.com/
Source: Glymour, Pearl, and Jewell (2016)
2.1 The Selection Problem Specifically, it includes a question “During the past 12 months, was the respondent a patient in a hospital overnight?” which we can use to identify recent hospital visitors. The NHIS also asks “Would you say your health in general is excellent, very good, good, fair, poor?” The following table displays the mean health status (assigning a 1 to excellent health and a 5 to poor health) among those who have been hospitalized and those who have not (tabulated from the 2005 NHIS):
Source: Angrist and Pischke (2009)
Group | Sample Size | Mean health status | Std. Error |
---|---|---|---|
Hospital | 7774 | 2.79 | 0.014 |
No Hospital | 90049 | 2.07 | 0.003 |
The difference in the means is 0.71, a large and highly significant contrast in favor of the non-hospitalized, with a \(t\)-statistic of 58.9.
Source: Angrist and Pischke (2009)
Figure 1.2: Wright’s graphical demonstration of the identification problem. Figure from Wright et al. (1928).
The price elasticity of demand is the solution to the following equation: \[\epsilon = \frac{\partial \log Q}{\partial \log P}\]
One possible example of an econometric model would be a linear demand function: \[\log Q_d = \alpha+\delta \log P+\gamma X + u\] where \(\alpha\) is the intercept, \(\delta\) is the elasticity of demand, \(X\) is a matrix of factors that determine demand like the prices of other goods or income, \(\gamma\) is the coefficient on the relationship between \(X\) and \(Q_d\), and \(u\) is the error term.
Source: Cunningham (2021), https://mixtape.scunning.com/01-introduction#example-identifying-price-elasticity-of-demand
Figure 2.1.1: The causal effect of the treatment on the outcome
Source: Huber (2023), Chapter 2.1
Messerli (2012) reports that there is a significant correlation between a country’s chocolate consumption (per capita) and the number of Nobel prizes awarded to its citizens (also per capita), see Figure 1.1.
Figure 1.1: The left figure is slightly modified from Messerli (2012), it shows a significant correlation between a country’s consumption of chocolate and the number of Nobel prizes (averaged per person). The right figure shows a similar result for coffee consumption; the data are based on Wikipedia (2013b,a).
Source: Peters (2015)
These correlations are properties of some observational distribution \(\mathbb{P}^{X}\). We must be careful with drawing conclusions like “Eating chocolate produces Nobel prize.” or “Geniuses are more likely to eat lots of chocolate”, see Figure 1.2 because these statements are “causal”.
Figure 1.2: Two online articles (downloaded from confectionarynews.com and forbes.com on Jan 29th 2013) drawing causal conclusions from the observed correlation between chocolate consumption and Nobel prizes, see Figure 1.1.
Source: Peters (2015)
Introduction to methods and theoretical concepts
How can we assess causal relationships based on data?
Learn the language and tools of causal analysis
Source: “Map of Causality.” (2023)
Overarching goal: Does a treatment or intervention, \(D\), have a causal effect on an outcome variable of interest, \(Y\)?
Key question: What would happen in the absence of the treatment?
Fundamental problem of causal inference: We can never observe the world with and without a particular intervention
RCTs are often considered as the gold standard of causal inference
In RCTs, the treatment can be assigned randomly to individuals
Under certain assumptions, random assignment allows to estimate the average causal effect as the difference of the mean outcome of the treatment and control group
Optimization Makes Everything Endogenous
In observational studies, individuals self-select into the treatment, i.e., they choose the treatment status that gives them the best potential outcome
Hence, a simple comparison of treatment vs. control group does not generally reveal the causal effect
The resulting bias is also called confounding bias, omitted variable bias or treatment selection bias (Huber 2023)
Solutions:
Graphical methods to illustrate causal relationships
Graphs allow to make statements on statistical relationship of variables, e.g. stochastic independence
Two example DAGs
Figure 2.1.1: The causal effect of the treatment on the outcome
Figure 2.2.2: Treatment selection bias
In this course, we will use the books
Causal Inference & DCL