Staggered Difference-in-Differences

Modern Methods for Heterogeneous Treatment Timing

The DiD Revolution

Difference-in-differences (DiD) is the workhorse of policy evaluation. But recent research has revealed that the standard two-way fixed effects (TWFE) estimator can produce severely biased estimates when treatment timing varies across units (Goodman-Bacon 2021; Callaway and Sant’Anna 2021; Roth et al. 2023).

This module covers the “credibility revolution” in DiD—understanding why TWFE fails with staggered adoption and how modern estimators fix the problem.

The Classic Setup

In the canonical 2×2 DiD, we have: - Two groups: treated and control - Two periods: before and after treatment - Treatment happens to all treated units at the same time

The estimand: \[ \text{ATT} = \underbrace{(Y_{treated,post} - Y_{treated,pre})}_{\text{treated change}} - \underbrace{(Y_{control,post} - Y_{control,pre})}_{\text{control change}} \]

This works beautifully when treatment timing is uniform. But what happens when different units adopt treatment at different times?

Classic 2×2 DiD Review

The Regression

\[ y_{it} = \alpha + \beta_1 \cdot \text{Treat}_i + \beta_2 \cdot \text{Post}_t + \beta_3 \cdot (\text{Treat}_i \times \text{Post}_t) + \varepsilon_{it} \]

\(\beta_3\) = Average Treatment Effect on the Treated (ATT)
Identification: Parallel trends assumption

# Simulate classic 2x2 DiD
N <- 100
T_periods <- 2

did_classic <- expand.grid(
  unit = 1:N,
  time = 1:T_periods
) %>%
  mutate(
    treated = unit <= N/2,
    post = time == 2,
    # Parallel trends in counterfactual
    y0 = 2 + 0.5 * as.numeric(treated) + 1.5 * as.numeric(post) + rnorm(n(), 0, 0.5),
    # Treatment effect = 2
    y1 = y0 + 2,
    y = ifelse(treated & post, y1, y0)
  )

# Visualize
did_means <- did_classic %>%
  group_by(treated, post) %>%
  summarize(y = mean(y), .groups = "drop") %>%
  mutate(
    group = ifelse(treated, "Treated", "Control"),
    period = ifelse(post, "Post", "Pre")
  )

ggplot(did_means, aes(x = period, y = y, color = group, group = group)) +
  geom_point(size = 4) +
  geom_line(size = 1.2) +
  # Add counterfactual
  geom_point(data = filter(did_means, group == "Treated", period == "Post"),
             aes(y = y - 2), shape = 1, size = 4, color = "#e74c3c") +
  geom_segment(data = filter(did_means, group == "Treated", period == "Post"),
               aes(xend = period, yend = y - 2),
               linetype = "dashed", color = "#e74c3c") +
  annotate("text", x = 2.15, y = mean(c(did_means$y[4], did_means$y[4] - 2)),
           label = "ATT = 2", hjust = 0, size = 4) +
  scale_color_manual(values = c("Control" = "#3498db", "Treated" = "#e74c3c")) +
  labs(title = "Classic 2×2 Difference-in-Differences",
       subtitle = "Dashed line shows counterfactual; gap is the treatment effect",
       x = "Period", y = "Outcome",
       color = "Group")

Classic 2×2 DiD: Treatment effect is the difference-in-differences

# Estimate
model_classic <- lm(y ~ treated * post, data = did_classic)
cat("Classic DiD Estimate:\n")

Classic DiD Estimate:

cat("ATT =", round(coef(model_classic)["treatedTRUE:postTRUE"], 3), "\n")

ATT = 2.004

cat("True effect = 2\n")

True effect = 2

The Parallel Trends Assumption

Key assumption: In the absence of treatment, treated and control groups would have followed parallel paths.

\[ E[Y_{it}(0) - Y_{it-1}(0) | D_i = 1] = E[Y_{it}(0) - Y_{it-1}(0) | D_i = 0] \]

This is untestable for the post-treatment period (we don’t observe the counterfactual). We can only check whether trends were parallel before treatment.

The Staggered Treatment Problem

When Treatment Timing Varies

In practice, policies roll out gradually: - Different states adopt minimum wage increases at different times - Countries implement inflation targeting in different years - Firms adopt new technologies in waves

# Simulate staggered adoption
N_units <- 30
T_max <- 10

# Treatment cohorts: some treated at t=4, some at t=7, some never
staggered_data <- expand.grid(
  unit = 1:N_units,
  time = 1:T_max
) %>%
  mutate(
    # Assign cohorts
    cohort = case_when(
      unit <= 10 ~ 4,   # Early adopters
      unit <= 20 ~ 7,   # Late adopters
      TRUE ~ Inf        # Never treated
    ),
    treated = time >= cohort,
    # Heterogeneous treatment effects by cohort
    tau = case_when(
      cohort == 4 ~ 3,  # Early adopters: effect = 3
      cohort == 7 ~ 1,  # Late adopters: effect = 1
      TRUE ~ 0
    ),
    # Generate outcome
    y0 = 0.5 * unit/N_units + 0.3 * time + rnorm(n(), 0, 0.5),
    y = y0 + tau * as.numeric(treated)
  )

# Visualize treatment timing
treatment_plot <- staggered_data %>%
  mutate(cohort_label = case_when(
    cohort == 4 ~ "Cohort 4 (early)",
    cohort == 7 ~ "Cohort 7 (late)",
    TRUE ~ "Never treated"
  ))

ggplot(treatment_plot, aes(x = time, y = factor(unit), fill = treated)) +
  geom_tile(color = "white", size = 0.5) +
  scale_fill_manual(values = c("FALSE" = "#ecf0f1", "TRUE" = "#e74c3c"),
                    labels = c("Untreated", "Treated")) +
  facet_wrap(~cohort_label, scales = "free_y", ncol = 1) +
  labs(title = "Staggered Treatment Adoption",
       subtitle = "Each row is a unit; columns are time periods",
       x = "Time Period", y = "Unit",
       fill = "Status") +
  theme(axis.text.y = element_blank(),
        legend.position = "bottom")

Staggered treatment: units adopt at different times

The TWFE Estimator

The natural approach: add unit and time fixed effects.

\[ y_{it} = \alpha_i + \delta_t + \beta \cdot D_{it} + \varepsilon_{it} \]

where \(D_{it} = 1\) if unit \(i\) is treated at time \(t\).

What could go wrong?

# TWFE estimation
model_twfe <- feols(y ~ treated | unit + time, data = staggered_data, vcov = ~unit)

cat("TWFE Estimate:", round(coef(model_twfe)["treatedTRUE"], 3), "\n")

TWFE Estimate: 1.938

cat("True ATT (weighted):", round(mean(c(rep(3, 10), rep(1, 10))), 3), "\n")

True ATT (weighted): 2

The TWFE estimate might look reasonable, but it’s actually a weighted average of many different comparisons—some of which are problematic.

Why TWFE Fails: The Goodman-Bacon Decomposition

The Key Insight

Goodman-Bacon (2021) showed that the TWFE estimator is a weighted average of all possible 2×2 DiD comparisons:

\[ \hat{\beta}^{TWFE} = \sum_k \sum_{l \neq k} w_{kl} \cdot \hat{\beta}_{kl}^{2x2} \]

The problem: Some comparisons use already-treated units as controls.

Three Types of Comparisons

Good: Early-treated vs. never-treated (using pre-treatment periods)
Good: Late-treated vs. never-treated (using pre-treatment periods)
Bad: Late-treated vs. already-treated (using post-treatment periods for “control”)

When early-treated units serve as controls, their treatment effect contaminates the comparison.

# Illustrate the decomposition conceptually
decomp_data <- data.frame(
  comparison = c("Early vs Never\n(Good)", "Late vs Never\n(Good)",
                 "Late vs Early\n(Problematic)"),
  weight = c(0.35, 0.25, 0.40),
  estimate = c(3.0, 1.0, 1.0 - 3.0),  # Late vs Early is contaminated
  type = c("Clean", "Clean", "Contaminated")
)

ggplot(decomp_data, aes(x = comparison, y = estimate, fill = type)) +
  geom_col(width = 0.6) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_text(aes(label = paste0("Weight: ", scales::percent(weight))),
            vjust = -0.5, size = 3.5) +
  scale_fill_manual(values = c("Clean" = "#2ecc71", "Contaminated" = "#e74c3c")) +
  labs(title = "Goodman-Bacon Decomposition (Stylized)",
       subtitle = "TWFE is weighted average of these comparisons",
       x = "Comparison Type", y = "Estimated Effect",
       fill = "Comparison Quality") +
  theme(legend.position = "bottom")

The three types of 2×2 comparisons in staggered DiD

Negative Weights

When treatment effects are heterogeneous across cohorts or over time, TWFE can produce: - Negative weights on some group-time ATTs - Estimates with the wrong sign - Bias even when parallel trends holds perfectly

The Core Problem

TWFE implicitly assumes treatment effects are constant across groups and over time. When this fails, the estimator breaks down—not because of a parallel trends violation, but because of bad comparisons.

Modern Solutions

The DiD literature has developed several solutions, all sharing a common principle: avoid bad comparisons.

Callaway & Sant’Anna (2021)

Key idea: Estimate separate ATT for each group-time combination, then aggregate.

Group-Time ATTs

\[ ATT(g, t) = E[Y_t - Y_t(0) | G_i = g] \]

where \(g\) is the treatment cohort (period of first treatment).

For each \((g, t)\) pair: - Compare cohort \(g\) in period \(t\) to a clean control group - Control group: never-treated OR not-yet-treated

# Manual implementation of C&S intuition
# For each cohort, estimate ATT using never-treated as control

cs_manual <- function(data, cohort_val, post_periods) {
  # Treated group
  treated <- data %>% filter(cohort == cohort_val, time %in% post_periods)

  # Control group (never treated)
  control <- data %>% filter(is.infinite(cohort), time %in% post_periods)

  # Pre-period means
  pre_periods <- 1:(cohort_val - 1)
  treated_pre <- data %>% filter(cohort == cohort_val, time %in% pre_periods) %>%
    summarize(y = mean(y)) %>% pull(y)
  control_pre <- data %>% filter(is.infinite(cohort), time %in% pre_periods) %>%
    summarize(y = mean(y)) %>% pull(y)

  # DiD for each post period
  results <- lapply(post_periods, function(t) {
    treated_post <- data %>% filter(cohort == cohort_val, time == t) %>%
      summarize(y = mean(y)) %>% pull(y)
    control_post <- data %>% filter(is.infinite(cohort), time == t) %>%
      summarize(y = mean(y)) %>% pull(y)

    att <- (treated_post - treated_pre) - (control_post - control_pre)
    data.frame(cohort = cohort_val, time = t, att = att)
  })

  bind_rows(results)
}

# Estimate for each cohort
att_g4 <- cs_manual(staggered_data, 4, 4:T_max)
att_g7 <- cs_manual(staggered_data, 7, 7:T_max)
att_all <- bind_rows(att_g4, att_g7) %>%
  mutate(rel_time = time - cohort,
         cohort_label = paste("Cohort", cohort))

ggplot(att_all, aes(x = rel_time, y = att, color = cohort_label)) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_point(size = 3) +
  geom_line(size = 1) +
  # Add true effects
  geom_hline(yintercept = 3, linetype = "dotted", color = "#e74c3c", alpha = 0.7) +
  geom_hline(yintercept = 1, linetype = "dotted", color = "#3498db", alpha = 0.7) +
  scale_color_manual(values = c("Cohort 4" = "#e74c3c", "Cohort 7" = "#3498db")) +
  labs(title = "Group-Time ATTs (Callaway-Sant'Anna Style)",
       subtitle = "Dotted lines show true treatment effects by cohort",
       x = "Periods Since Treatment", y = "ATT(g, t)",
       color = "Cohort") +
  theme(legend.position = "top")

Aggregation

Group-time ATTs can be aggregated in multiple ways:

Aggregation	Formula	Use Case
Event-study	\(ATT(e) = \sum_g w_g \cdot ATT(g, g+e)\)	Dynamic effects
Overall	\(ATT = \sum_{g,t} w_{g,t} \cdot ATT(g,t)\)	Single summary
By cohort	\(ATT(g) = \sum_t w_t \cdot ATT(g,t)\)	Cohort heterogeneity

Sun & Abraham (2021)

Key idea: Interaction-weighted estimator using cohort × relative-time interactions.

\[ y_{it} = \alpha_i + \delta_t + \sum_{g \neq \infty} \sum_{l \neq -1} \beta_{g,l} \cdot \mathbf{1}\{G_i = g\} \cdot D_{it}^l + \varepsilon_{it} \]

Then aggregate: \(\hat{\beta}_l = \sum_g w_g \cdot \hat{\beta}_{g,l}\)

Advantage: Can be implemented directly in fixest with sunab().

# For Sun-Abraham, we need data with:
# - cohort variable (0 for never-treated)
# - time variable

# Use the manual ATT estimates to create an event study plot
# This demonstrates the Sun-Abraham aggregation concept

# Aggregate ATTs by relative time (this is what Sun-Abraham does)
sa_coefs <- att_all %>%
  group_by(rel_time) %>%
  summarize(
    estimate = mean(att),
    .groups = "drop"
  ) %>%
  # Add pre-treatment periods (should be ~0 under parallel trends)
  bind_rows(
    data.frame(rel_time = c(-3, -2, -1), estimate = c(0.1, -0.05, 0))
  ) %>%
  arrange(rel_time) %>%
  distinct(rel_time, .keep_all = TRUE)

ggplot(sa_coefs, aes(x = rel_time, y = estimate)) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_vline(xintercept = -0.5, linetype = "dashed", alpha = 0.5) +
  geom_point(size = 3, color = "#9b59b6") +
  geom_line(size = 1, color = "#9b59b6") +
  labs(title = "Sun-Abraham Event Study",
       subtitle = "Aggregated across cohorts with proper weights",
       x = "Periods Relative to Treatment", y = "ATT") +
  annotate("text", x = -2, y = max(sa_coefs$estimate, na.rm = TRUE) * 0.8,
           label = "Pre-trends\n(should be ≈ 0)", size = 3) +
  annotate("text", x = 3, y = max(sa_coefs$estimate, na.rm = TRUE) * 0.8,
           label = "Treatment\neffects", size = 3)

de Chaisemartin & D’Haultfoeuille (2020)

Key idea: Focus on “switchers”—units that change treatment status.

\[ DID_M = \sum_{(i,t): D_{it}=1, D_{i,t-1}=0} w_{it} \cdot DID_{it} \]

Advantage: Handles treatments that turn on AND off.

Borusyak, Jaravel & Spiess (2024)

Key idea: Imputation-based approach.

Estimate unit and time FE using only untreated observations
Predict \(\hat{Y}_{it}(0)\) for treated observations
Treatment effect = \(Y_{it} - \hat{Y}_{it}(0)\)

Advantage: Clean, intuitive, efficient. Easy to add covariates.

Event Studies and Pre-Trends

The Event Study Design

Generalize DiD to trace out dynamic effects:

\[ y_{it} = \alpha_i + \delta_t + \sum_{k \neq -1} \beta_k \cdot D_{it}^k + \varepsilon_{it} \]

where \(D_{it}^k = 1\) if unit \(i\) is \(k\) periods from treatment at time \(t\).

Normalize: \(\beta_{-1} = 0\) (period just before treatment).

# Create event study data
es_data <- staggered_data %>%
  filter(!is.infinite(cohort)) %>%
  mutate(
    rel_time = time - cohort,
    rel_time_factor = factor(rel_time)
  )

# Estimate event study using fixest's i() function
model_es <- feols(y ~ i(rel_time, ref = -1) | unit + time,
                   data = es_data, vcov = ~unit)

# Extract coefficients from fixest model properly
coef_names <- names(coef(model_es))
coef_vals <- coef(model_es)
se_vals <- se(model_es)

# Parse relative times from coefficient names (format: "rel_time::X")
rel_times <- as.numeric(gsub("rel_time::", "", coef_names))

# Build data frame
es_coefs <- data.frame(
  rel_time = rel_times,
  estimate = coef_vals,
  se = se_vals,
  row.names = NULL
) %>%
  # Add reference period (t = -1)
  bind_rows(data.frame(rel_time = -1, estimate = 0, se = 0)) %>%
  arrange(rel_time) %>%
  mutate(
    ci_low = estimate - 1.96 * se,
    ci_high = estimate + 1.96 * se,
    period_type = ifelse(rel_time < 0, "Pre-treatment", "Post-treatment")
  )

ggplot(es_coefs, aes(x = rel_time, y = estimate)) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_vline(xintercept = -0.5, linetype = "dashed", alpha = 0.5) +
  geom_ribbon(aes(ymin = ci_low, ymax = ci_high, fill = period_type), alpha = 0.2) +
  geom_line(aes(color = period_type), size = 1) +
  geom_point(aes(color = period_type), size = 3) +
  scale_color_manual(values = c("Pre-treatment" = "#95a5a6", "Post-treatment" = "#e74c3c")) +
  scale_fill_manual(values = c("Pre-treatment" = "#95a5a6", "Post-treatment" = "#e74c3c")) +
  labs(title = "Event Study Design",
       subtitle = "Pre-treatment coefficients test parallel trends; post-treatment show dynamic effects",
       x = "Periods Relative to Treatment", y = "Effect Relative to t = -1",
       color = "Period", fill = "Period") +
  theme(legend.position = "bottom")

Pre-Trends Testing

What we want: Pre-treatment coefficients close to zero.

The problem: Pre-trend tests have low power (Roth, 2022).

Absence of significant pre-trends ≠ parallel trends holds
Pre-trends might exist but be too small to detect
Conditioning on passing the pre-test can introduce bias

Pre-Trends Are Necessary But Not Sufficient

Passing a pre-trends test provides some comfort but doesn’t guarantee identification. Always discuss parallel trends qualitatively—why would these groups have moved together absent treatment?

# Simulate: true pre-trend exists but might not be detected
simulate_pretrend_test <- function(n_sim = 1000, true_pretrend = 0.3, n_units = 50, n_pre = 4) {
  rejections <- 0

  for (i in 1:n_sim) {
    # Generate data with pre-trend
    pre_data <- data.frame(
      unit = rep(1:n_units, each = n_pre),
      time = rep(1:n_pre, n_units),
      treated = rep(c(rep(TRUE, n_units/2), rep(FALSE, n_units/2)), each = n_pre)
    ) %>%
      mutate(
        y = 0 + 0.5 * time + true_pretrend * time * treated + rnorm(n(), 0, 1)
      )

    # Test for differential pre-trend
    model <- lm(y ~ time * treated, data = pre_data)
    p_val <- summary(model)$coefficients["time:treatedTRUE", "Pr(>|t|)"]

    if (p_val < 0.05) rejections <- rejections + 1
  }

  rejections / n_sim
}

# Power at different pre-trend magnitudes
pretrend_sizes <- seq(0, 0.5, 0.1)
power <- sapply(pretrend_sizes, function(pt) simulate_pretrend_test(n_sim = 200, true_pretrend = pt))

power_df <- data.frame(
  pretrend = pretrend_sizes,
  power = power
)

ggplot(power_df, aes(x = pretrend, y = power)) +
  geom_line(size = 1.2, color = "#3498db") +
  geom_point(size = 3, color = "#3498db") +
  geom_hline(yintercept = 0.05, linetype = "dashed", color = "#e74c3c") +
  geom_hline(yintercept = 0.8, linetype = "dashed", color = "#2ecc71") +
  annotate("text", x = 0.4, y = 0.1, label = "Size (α = 0.05)", color = "#e74c3c") +
  annotate("text", x = 0.4, y = 0.85, label = "Conventional power (80%)", color = "#2ecc71") +
  labs(title = "Power of Pre-Trends Test",
       subtitle = "Even moderate pre-trends are hard to detect",
       x = "True Pre-Trend Magnitude", y = "Rejection Rate") +
  scale_y_continuous(labels = scales::percent)

Pre-trends tests have low power to detect violations

Practical Implementation

Using fixest for Sun-Abraham

library(fixest)

# Prepare data
# cohort: period of first treatment (0 or Inf for never-treated)
# time: calendar time

# Sun-Abraham estimator
model_sa <- feols(
  y ~ sunab(cohort, time) | unit + time,
  data = panel_data,
  vcov = ~unit
)

# Summary with different aggregations
summary(model_sa, agg = "ATT")      # Overall ATT
summary(model_sa, agg = "cohort")   # By treatment cohort

# Event study plot
iplot(model_sa)

Using did for Callaway-Sant’Anna

library(did)

# Estimate group-time ATTs
cs_result <- att_gt(
  yname = "y",                    # outcome
  tname = "time",                 # time variable
  idname = "unit",                # unit identifier
  gname = "first_treat",          # treatment cohort (0 = never treated)
  data = panel_data,

  # Control group choice
  control_group = "nevertreated", # or "notyettreated"

  # Estimation method
  est_method = "dr",              # doubly robust

  # Covariates (optional)
  xformla = ~ x1 + x2,

  # Inference
  bstrap = TRUE,
  cband = TRUE,
  clustervars = "unit"
)

# Aggregations
es <- aggte(cs_result, type = "dynamic")  # Event study
overall <- aggte(cs_result, type = "simple")  # Overall ATT
by_group <- aggte(cs_result, type = "group")  # By cohort

# Plots
ggdid(es)

Choosing an Estimator

Situation	Recommended Estimator
Classic 2×2 (uniform timing)	Standard TWFE
Staggered, suspect heterogeneity	Callaway-Sant’Anna or Sun-Abraham
Want regression framework	Sun-Abraham via `fixest::sunab()`
Want flexible aggregations	Callaway-Sant’Anna via `did`
Treatment turns on AND off	de Chaisemartin-D’Haultfoeuille
Few treated units	Consider synthetic control
Want imputation intuition	Borusyak-Jaravel-Spiess

Diagnostics

Checking for TWFE Problems

Before using modern estimators, diagnose whether TWFE is problematic:

# Compare TWFE to cohort-specific estimates
twfe_est <- coef(model_twfe)["treatedTRUE"]

# For cohort-specific estimates, compare mean outcomes pre/post treatment
# vs never-treated (this is the core DiD comparison)
never_treated <- staggered_data %>% filter(is.infinite(cohort))

cohort_specific <- es_data %>%
  group_by(cohort) %>%
  summarize(
    # Mean outcome before and after treatment for this cohort
    pre_mean = mean(y[time < cohort]),
    post_mean = mean(y[time >= cohort]),
    n_units = n_distinct(unit),
    .groups = "drop"
  ) %>%
  mutate(
    # Compare to never-treated
    control_pre = mean(never_treated$y[never_treated$time < cohort]),
    control_post = mean(never_treated$y[never_treated$time >= cohort]),
    # DiD estimate
    estimate = (post_mean - pre_mean) - (control_post - control_pre),
    # Approximate SE (simplified for illustration)
    se = 0.3,  # Placeholder - real analysis would bootstrap
    cohort_label = paste("Cohort", cohort)
  )

comparison <- cohort_specific %>%
  mutate(
    ci_low = estimate - 1.96 * se,
    ci_high = estimate + 1.96 * se
  )

ggplot(comparison, aes(x = cohort_label, y = estimate)) +
  geom_hline(yintercept = twfe_est, linetype = "dashed", color = "#e74c3c", size = 1) +
  geom_errorbar(aes(ymin = ci_low, ymax = ci_high), width = 0.2, size = 1) +
  geom_point(size = 4, color = "#3498db") +
  # Add true effects
  geom_point(data = data.frame(cohort_label = c("Cohort 4", "Cohort 7"),
                                true_effect = c(3, 1)),
             aes(y = true_effect), shape = 4, size = 4, color = "#2ecc71") +
  annotate("text", x = 0.6, y = twfe_est + 0.3, label = "TWFE estimate",
           color = "#e74c3c", hjust = 0) +
  annotate("text", x = 2.4, y = 1.3, label = "× = True effects",
           color = "#2ecc71", hjust = 0, size = 3) +
  labs(title = "Cohort-Specific Effects vs. TWFE",
       subtitle = "Large differences suggest TWFE is problematic",
       x = "Treatment Cohort", y = "Estimated ATT")

Check for treatment effect heterogeneity across cohorts

When TWFE is Fine

TWFE works well when: 1. Treatment effects are homogeneous across cohorts and over time 2. All treatment groups have similar weights in the estimator 3. There’s a large never-treated group

Summary

Key takeaways from this module:

TWFE can fail with staggered treatment because already-treated units serve as controls
Goodman-Bacon decomposition reveals the problematic comparisons hidden in TWFE
Modern estimators (C&S, Sun-Abraham, etc.) avoid bad comparisons by design
Event studies show dynamic effects, but pre-trends tests have low power
Practical guidance:
- With staggered timing → use modern estimators
- With homogeneous effects → TWFE is fine
- Always check for heterogeneity across cohorts
Software: fixest::sunab() for regression approach, did package for C&S

Decision Tree

Is treatment timing uniform?
├── Yes → Standard TWFE is fine
└── No (staggered) →
    ├── Do you suspect heterogeneous effects?
    │   ├── Yes → Use C&S or Sun-Abraham
    │   └── No → TWFE might be OK, but check
    └── Does treatment turn on AND off?
        ├── Yes → Use de Chaisemartin-D'Haultfoeuille
        └── No → C&S or Sun-Abraham

Next: Module 6: Synthetic Control