Digital Causality Lab

Kick-off Data Product Phase

Philipp Bach

University of Hamburg

Outline

Digital Causality Lab

About the Digital Causality Lab

Digital Causality Lab (1 SWS, Oliver Schacht)
- New - replaces the former tutorial
- Focus on practice, implementation and tools
- Data Literacy skills
- Independent learning and collaboration
- Teaching Phase and Data Product Phase
- Wednesday, 2 pm and 3 pm, WiWi 2079
- Questions to oliver.schacht@uni-hamburg.de

Digital Causality Lab

1. Teaching Phase (done)

Thorough introduction of tools for causal analysis and data literacy
- 1. Recap: Statistics
- 1. Introduction to R
- 1. Introduction to GitHub and Git
- 1. Causal Inference in Practice
- 1. Data Products with Quarto
Week 1 to 4

Digital Causality Lab

2. Data Product Phase

Independent development of a data product on causality
Solve case study in a group of students
Creative solution: It’s all up to you, from the concept to the implementation
Three milestones: 1. Concept - 2. Protoype, 2. Final
Week 4/5 to week 13

Digital Causality Lab

2. Data Product Phase

Case Study Topics
1. Illustration of causal phenomena
2. Graphical approaches
3. Real-data examples
4. Illustration of estimation approaches
5. Causal estimation in practice
List of case studies available on GitHub

Outlook

Plan for the course

Outlook

Plan for the DCL data product phase

Workflow: Example Data Product

Consider the collider bias app as an example of a data product
The general goal of the case study was

Generate an app that illustrates the collider bias in an example

Hence this would fit into the category 1. Illustration of causal phenomena

Workflow: Example Data Product

Data product development: 3 phases

1. Conception - Topic, ideas, result (🏴 Concept)
2. Implementation - From basic R scripts to a data product (🏴 Prototype)
3. Launch - Presentation, release, publish code (🏴 Final)

Workflow: Example Data Product

1. Conception

Content/topic:
- What is the data product/case study about?
Choice of data product type:
- What kind of data product is suitable to illustrate the collider bias?
- Medium: Shiny app, blog post / quarto notebook, R function/package, \(\ldots\)
- Components: Raw data set (table), scatter plot, DAG, regression output, \(\ldots\)

Workflow: Example Data Product

1. Conception

Which example could be used in our data product?
- Real data or simulated data?
- Data generating process
Collider bias example:
- Data product type: Interactive app (shiny) with scatter plot, DAG, regression output
- Content/topic: Movie star or discrimination example?
- Data/DGP: Movie star example from Section 3.6. of Cunningham (2021), simulated data
Ends with first-round feedback

Workflow: Example Data Product

2. Implementation

Start with basic R scripts
- 1. Implement a DGP
- 1. Basic visualization: DAGs, scatter plots, regression output
Iteration on example
- Does the result reflect the goal of the case study in a concise and convincing way?
- What can be improved?

Workflow: Example Data Product

2. Implementation

Improvements
- Can we improve the code by cleaning up, using specific packages and/or speeding up calculations?
- Can we present the content of the case study in a better way?
Ends with second-round feedback

Workflow: Example Data Product

2. Implementation

Collider bias example:
- Data based on existing code by S. Cunningham

set.seed(42)

# load libraries (type install("[libraryname]") for installation)
library(tidyverse)
library(gtsummary)

star_is_born <- tibble(
  beauty = rnorm(2500),
  talent = rnorm(2500),
  score = beauty + talent,
  c85 = quantile(score, .85),
  star = ifelse(score >= c85, 1, 0)
)

Workflow: Example Data Product

2. Implementation

Collider bias example:
- Visualization with ggplot

star_is_born %>%
  ggplot(aes(x = talent, y = beauty)) +
  geom_point(size = 0.5, shape = 23) +
  stat_smooth(method = "lm", col = "red") +
  xlim(-4, 4) +
  ylim(-4, 4)

Workflow: Example Data Product

2. Implementation

Collider bias example:
- Improvement: Customize visualization, integrate in shiny app, \(\ldots\)

Workflow: Example Data Product

3. Launch

Incorporate feedback (from first and second round)
Develop final version of the data product
Present data product in lecture
Deploy/launch product, e.g. in DCL Gallery
Critical reflexion and feedback

Workflow: Example Data Product

3. Launch

Collider bias example:
- Adjust code to work in shiny app
- Optimize presentation (figures, text description)
- Improve speed
- Deploy shiny app with docker and heroku (technical)
- Integrate in DCL website
- Publish source code on GitHub

Workflow: Example Data Product

Data product development: 3 phases

The workflow will depend on your case study
The most important step is to understand what the data product is all about
Be creative and try to find a way that illustrates the core idea of your case study
Read our notebooks, books, blogs, package documentations and proceed step-by-step
Collaborate with your colleagues, ask for help (we are here to help)
Manage expectations: We know that the time is limited and you have to learn a lot about the data product tools

References

Cunningham, Scott. 2021. Causal Inference: The Mixtape. Yale university press.