Digital Causality Lab

Kick-off Data Product Phase

Philipp Bach

University of Hamburg

Outline

Digital Causality Lab


About the Digital Causality Lab

  • Digital Causality Lab (1 SWS, Oliver Schacht)
    • New - replaces the former tutorial
    • Focus on practice, implementation and tools
    • Data Literacy skills
    • Independent learning and collaboration
    • Teaching Phase and Data Product Phase
    • Wednesday, 2 pm and 3 pm, WiWi 2079
    • Questions to oliver.schacht@uni-hamburg.de

Digital Causality Lab


1. Teaching Phase (done)

  • Thorough introduction of tools for causal analysis and data literacy
      1. Recap: Statistics
      1. Introduction to R
      1. Introduction to GitHub and Git
      1. Causal Inference in Practice
      1. Data Products with Quarto
  • Week 1 to 4

Digital Causality Lab


2. Data Product Phase

  • Independent development of a data product on causality
  • Solve case study in a group of students
  • Creative solution: It’s all up to you, from the concept to the implementation
  • Three milestones: 1. Concept - 2. Protoype, 2. Final
  • Week 4/5 to week 13

Digital Causality Lab


2. Data Product Phase

  • Case Study Topics
    1. Illustration of causal phenomena
    2. Graphical approaches
    3. Real-data examples
    4. Illustration of estimation approaches
    5. Causal estimation in practice
  • List of case studies available on GitHub

Outlook

Plan for the course

Outlook

Plan for the DCL data product phase

Workflow: Example Data Product

Workflow: Example Data Product

Generate an app that illustrates the collider bias in an example

  • Hence this would fit into the category 1. Illustration of causal phenomena

Workflow: Example Data Product


Data product development: 3 phases

  • 1. Conception - Topic, ideas, result (🏴 Concept)

  • 2. Implementation - From basic R scripts to a data product (🏴 Prototype)

  • 3. Launch - Presentation, release, publish code (🏴 Final)

Workflow: Example Data Product

1. Conception

  • Content/topic:
    • What is the data product/case study about?
  • Choice of data product type:
    • What kind of data product is suitable to illustrate the collider bias?
    • Medium: Shiny app, blog post / quarto notebook, R function/package, \(\ldots\)
    • Components: Raw data set (table), scatter plot, DAG, regression output, \(\ldots\)

Workflow: Example Data Product

1. Conception

  • Which example could be used in our data product?
    • Real data or simulated data?
    • Data generating process
  • Collider bias example:
    • Data product type: Interactive app (shiny) with scatter plot, DAG, regression output
    • Content/topic: Movie star or discrimination example?
    • Data/DGP: Movie star example from Section 3.6. of Cunningham (2021), simulated data
  • Ends with first-round feedback

Workflow: Example Data Product

2. Implementation

  • Start with basic R scripts
      1. Implement a DGP
      1. Basic visualization: DAGs, scatter plots, regression output
  • Iteration on example
    • Does the result reflect the goal of the case study in a concise and convincing way?
    • What can be improved?

Workflow: Example Data Product

2. Implementation

  • Improvements
    • Can we improve the code by cleaning up, using specific packages and/or speeding up calculations?
    • Can we present the content of the case study in a better way?
  • Ends with second-round feedback

Workflow: Example Data Product

2. Implementation

set.seed(42)

# load libraries (type install("[libraryname]") for installation)
library(tidyverse)
library(gtsummary)

star_is_born <- tibble(
  beauty = rnorm(2500),
  talent = rnorm(2500),
  score = beauty + talent,
  c85 = quantile(score, .85),
  star = ifelse(score >= c85, 1, 0)
)

Workflow: Example Data Product

2. Implementation

  • Collider bias example:
    • Visualization with ggplot
star_is_born %>%
  ggplot(aes(x = talent, y = beauty)) +
  geom_point(size = 0.5, shape = 23) +
  stat_smooth(method = "lm", col = "red") +
  xlim(-4, 4) +
  ylim(-4, 4)

Workflow: Example Data Product

2. Implementation

  • Collider bias example:
    • Improvement: Customize visualization, integrate in shiny app, \(\ldots\)

Workflow: Example Data Product

3. Launch

  • Incorporate feedback (from first and second round)

  • Develop final version of the data product

  • Present data product in lecture

  • Deploy/launch product, e.g. in DCL Gallery

  • Critical reflexion and feedback

Workflow: Example Data Product

3. Launch

  • Collider bias example:
    • Adjust code to work in shiny app
    • Optimize presentation (figures, text description)
    • Improve speed
    • Deploy shiny app with docker and heroku (technical)
    • Integrate in DCL website
    • Publish source code on GitHub

Workflow: Example Data Product

Data product development: 3 phases

  • The workflow will depend on your case study

  • The most important step is to understand what the data product is all about

  • Be creative and try to find a way that illustrates the core idea of your case study

  • Read our notebooks, books, blogs, package documentations and proceed step-by-step

  • Collaborate with your colleagues, ask for help (we are here to help)

  • Manage expectations: We know that the time is limited and you have to learn a lot about the data product tools

References

References

Cunningham, Scott. 2021. Causal Inference: The Mixtape. Yale university press.