Bayesian Panel Methods

Hierarchical Models, Panel VARs, and Cross-Sectional Dependence

The Panel Data Challenge

Cross-country macroeconomic panels present a fundamental tension:

Heterogeneity: Countries differ in institutions, structures, and responses Limited Data: Each country has finite time series (T = 84 quarters) Efficiency: Ignoring cross-country information wastes data

NoteThe Core Question

How do we pool information across countries while respecting heterogeneity?

Three Approaches

Approach Assumption Problem
Pooled All countries identical Ignores heterogeneity
Country-by-country Countries unrelated Inefficient, noisy
Hierarchical Bayes Countries related but different Best of both worlds

With N = 46 countries and T = 84 quarters, hierarchical Bayesian methods offer the natural solution.

Hierarchical Models: The Framework

The Multilevel Structure

Consider a simple panel regression: \[ y_{it} = x_{it}'\beta_i + \varepsilon_{it}, \quad \varepsilon_{it} \sim N(0, \sigma^2_i) \]

Level 1 (Within-Country): \[ \beta_i | \mu, \Omega \sim N(\mu, \Omega) \]

Level 2 (Across-Countries): \[ \mu \sim N(\mu_0, V_0), \quad \Omega \sim \text{Inverse-Wishart}(\Omega_0, \nu_0) \]

The country-specific coefficients \(\beta_i\) are drawn from a common distribution with unknown mean \(\mu\) and variance \(\Omega\).

Shrinkage and Partial Pooling

The posterior for \(\beta_i\) combines:

  1. Country-specific data (likelihood)
  2. Cross-country information (hierarchical prior)

\[ \hat{\beta}_i^{\text{Bayes}} = \lambda_i \hat{\beta}_i^{\text{OLS}} + (1 - \lambda_i) \bar{\beta} \]

where \(\lambda_i \in [0,1]\) depends on: - Precision of country data: More data → less shrinkage - Cross-country heterogeneity: More variance in \(\Omega\) → less shrinkage

Code
# Simulate hierarchical data
N <- 20  # countries
T_obs <- 50  # periods per country

# True hierarchical structure
mu_true <- 0.5
omega_true <- 0.3^2
sigma_true <- 1

# Country-specific coefficients
beta_true <- rnorm(N, mu_true, sqrt(omega_true))

# Generate data
data_hier <- map_dfr(1:N, function(i) {
  x <- rnorm(T_obs)
  y <- beta_true[i] * x + rnorm(T_obs, 0, sigma_true)
  tibble(country = i, x = x, y = y)
})

# OLS estimates by country
ols_estimates <- data_hier %>%
  group_by(country) %>%
  summarise(
    beta_ols = coef(lm(y ~ x - 1))[1],
    se_ols = summary(lm(y ~ x - 1))$coefficients[1, 2],
    n = n()
  )

# Hierarchical shrinkage (empirical Bayes approximation)
grand_mean <- mean(ols_estimates$beta_ols)
between_var <- var(ols_estimates$beta_ols)
within_var <- mean(ols_estimates$se_ols^2)

shrinkage_factor <- between_var / (between_var + within_var / T_obs)

ols_estimates <- ols_estimates %>%
  mutate(
    beta_hier = shrinkage_factor * beta_ols + (1 - shrinkage_factor) * grand_mean,
    beta_true = beta_true
  )

# Plot
ggplot(ols_estimates) +
  geom_segment(aes(x = beta_ols, xend = beta_hier, y = country, yend = country),
               arrow = arrow(length = unit(0.1, "cm")), color = "gray50") +
  geom_point(aes(x = beta_ols, y = country), color = "#e74c3c", size = 2) +
  geom_point(aes(x = beta_hier, y = country), color = "#3498db", size = 2) +
  geom_point(aes(x = beta_true, y = country), shape = 4, size = 2, color = "#2ecc71") +
  geom_vline(xintercept = grand_mean, linetype = "dashed") +
  annotate("text", x = grand_mean + 0.1, y = N + 1, label = "Global mean") +
  labs(title = "Hierarchical Shrinkage in Action",
       subtitle = "Red = OLS, Blue = Hierarchical, Green × = True",
       x = expression(beta[i]), y = "Country") +
  theme_minimal()

Hierarchical shrinkage: country estimates pulled toward global mean

Why Shrinkage Helps

Key Insight: Countries with less precise estimates get shrunk more toward the global mean.

Country Type Data Quality Shrinkage Result
Data-rich (US, Germany) High Low Near OLS
Data-poor (small EMs) Low High Near global mean

This is not bias — it’s optimal use of information under the hierarchical structure.

Gibbs Sampler for Hierarchical Panel

The Algorithm

For the hierarchical linear model:

Block 1: Draw \(\beta_i | y_i, \mu, \Omega, \sigma^2_i\) for each country \[ \beta_i | \cdot \sim N\left( V_i^{-1}(X_i'y_i/\sigma^2_i + \Omega^{-1}\mu), V_i^{-1} \right) \] where \(V_i = X_i'X_i/\sigma^2_i + \Omega^{-1}\)

Block 2: Draw \(\sigma^2_i | y_i, \beta_i\) \[ \sigma^2_i | \cdot \sim \text{Inverse-Gamma}\left( \frac{T_i}{2}, \frac{(y_i - X_i\beta_i)'(y_i - X_i\beta_i)}{2} \right) \]

Block 3: Draw \(\mu | \beta_1, \ldots, \beta_N, \Omega\) \[ \mu | \cdot \sim N\left( (N\Omega^{-1} + V_0^{-1})^{-1}(N\Omega^{-1}\bar{\beta} + V_0^{-1}\mu_0), (N\Omega^{-1} + V_0^{-1})^{-1} \right) \]

Block 4: Draw \(\Omega | \beta_1, \ldots, \beta_N, \mu\) \[ \Omega | \cdot \sim \text{Inverse-Wishart}\left( \Omega_0 + \sum_{i=1}^N (\beta_i - \mu)(\beta_i - \mu)', \nu_0 + N \right) \]

Code
# Hierarchical Gibbs sampler for panel regression
hierarchical_panel_gibbs <- function(data, n_draw = 5000, n_burn = 1000) {

  countries <- unique(data$country)
  N <- length(countries)
  K <- 1  # number of regressors (simplified)

  # Organize data by country
  country_data <- map(countries, function(c) {
    d <- filter(data, country == c)
    list(y = d$y, X = matrix(d$x, ncol = 1), T_i = nrow(d))
  })

  # Priors
  mu0 <- 0
  V0 <- 10
  omega0 <- 1
  nu0 <- 3
  a0 <- 0.01; b0 <- 0.01  # for sigma^2

  # Storage
  beta_draws <- array(0, dim = c(N, K, n_draw))
  mu_draws <- matrix(0, n_draw, K)
  omega_draws <- numeric(n_draw)
  sigma2_draws <- matrix(0, n_draw, N)

  # Initialize
  beta_curr <- rep(0, N)
  sigma2_curr <- rep(1, N)
  mu_curr <- 0
  omega_curr <- 1

  for (g in 1:(n_draw + n_burn)) {

    # Block 1: Draw beta_i | rest
    for (i in 1:N) {
      X_i <- country_data[[i]]$X
      y_i <- country_data[[i]]$y
      T_i <- country_data[[i]]$T_i

      V_i <- as.numeric(crossprod(X_i) / sigma2_curr[i] + 1/omega_curr)
      m_i <- (crossprod(X_i, y_i) / sigma2_curr[i] + mu_curr/omega_curr) / V_i
      beta_curr[i] <- rnorm(1, m_i, sqrt(1/V_i))
    }

    # Block 2: Draw sigma2_i | rest
    for (i in 1:N) {
      X_i <- country_data[[i]]$X
      y_i <- country_data[[i]]$y
      T_i <- country_data[[i]]$T_i

      resid <- y_i - X_i * beta_curr[i]
      a_post <- a0 + T_i/2
      b_post <- b0 + sum(resid^2)/2
      sigma2_curr[i] <- 1/rgamma(1, a_post, b_post)
    }

    # Block 3: Draw mu | rest
    V_mu <- 1 / (N/omega_curr + 1/V0)
    m_mu <- V_mu * (sum(beta_curr)/omega_curr + mu0/V0)
    mu_curr <- rnorm(1, m_mu, sqrt(V_mu))

    # Block 4: Draw omega | rest
    nu_post <- nu0 + N
    S_post <- omega0 + sum((beta_curr - mu_curr)^2)
    omega_curr <- 1/rgamma(1, nu_post/2, S_post/2)

    # Store
    if (g > n_burn) {
      idx <- g - n_burn
      beta_draws[, , idx] <- beta_curr
      mu_draws[idx, ] <- mu_curr
      omega_draws[idx] <- omega_curr
      sigma2_draws[idx, ] <- sigma2_curr
    }
  }

  list(beta = beta_draws, mu = mu_draws, omega = omega_draws, sigma2 = sigma2_draws)
}

# Run on simulated data
hier_result <- hierarchical_panel_gibbs(data_hier, n_draw = 3000, n_burn = 1000)

# Posterior summary
beta_post_mean <- apply(hier_result$beta, 1, mean)
beta_post_sd <- apply(hier_result$beta, 1, sd)

# Compare to truth
comparison <- tibble(
  country = 1:N,
  true = beta_true,
  ols = ols_estimates$beta_ols,
  bayes_mean = beta_post_mean,
  bayes_sd = beta_post_sd
)

# Plot posterior distributions for selected countries
selected <- c(1, 5, 10, 15)
post_df <- map_dfr(selected, function(i) {
  tibble(
    country = paste("Country", i),
    value = hier_result$beta[i, 1, ],
    true = beta_true[i],
    ols = ols_estimates$beta_ols[i]
  )
})

ggplot(post_df, aes(x = value)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30, fill = "#3498db", alpha = 0.7) +
  geom_vline(aes(xintercept = true), color = "#2ecc71", linetype = "dashed", linewidth = 1) +
  geom_vline(aes(xintercept = ols), color = "#e74c3c", linetype = "dotted", linewidth = 1) +
  facet_wrap(~country, scales = "free_y") +
  labs(title = "Posterior Distributions of Country-Specific Coefficients",
       subtitle = "Green dashed = True, Red dotted = OLS",
       x = expression(beta[i]), y = "Density") +
  theme_minimal()

Posterior distributions from hierarchical panel model

Bayesian Panel VAR

The Model Structure

For N countries, each with a K-variable VAR(p):

\[ y_{it} = c_i + A_{i,1} y_{i,t-1} + \cdots + A_{i,p} y_{i,t-p} + u_{it} \]

Hierarchical Prior: \[ \text{vec}(B_i) | \mu_B, \Omega_B \sim N(\mu_B, \Omega_B) \]

where \(B_i = [c_i, A_{i,1}, \ldots, A_{i,p}]'\) stacks all VAR coefficients for country \(i\).

Why Panel VAR?

Feature Separate VARs Pooled VAR Panel VAR
Heterogeneity Yes No Yes
Sample size T per country N×T N×T (shared info)
Curse of dimensionality Severe Moderate Mitigated
Interpretation Country-specific Global average Both

With K = 3 variables, p = 2 lags, and T = 84: - Parameters per country: 3 × (1 + 3×2) = 21 - Observations per country: 84 - 2 = 82 - Degrees of freedom: Tight!

Hierarchical pooling provides effective sample size boost without forcing homogeneity.

Mean-Group Prior Structure

Following Canova & Ciccarelli (2009), we can decompose:

\[ B_i = \underbrace{\bar{B}}_{\text{common}} + \underbrace{\tilde{B}_i}_{\text{country-specific}} + \underbrace{Z_i \gamma}_{\text{observed heterogeneity}} \]

where: - \(\bar{B}\): Common component (global dynamics) - \(\tilde{B}_i\): Idiosyncratic deviation - \(Z_i\): Country characteristics (GDP per capita, openness, institutions)

Code
# Illustrate panel VAR structure
panel_var_diagram <- tibble(
  level = c("Data", "Data", "Country VAR", "Country VAR", "Hierarchical", "Hyperprior"),
  component = c("y_it (Country 1)", "y_it (Country N)",
                "B_1 ~ N(mu_B, Omega)", "B_N ~ N(mu_B, Omega)",
                "mu_B, Omega", "mu_0, V_0, Omega_0, nu_0"),
  x = c(1, 3, 1, 3, 2, 2),
  y = c(1, 1, 2, 2, 3, 4)
)

ggplot(panel_var_diagram, aes(x = x, y = y, label = component)) +
  geom_point(size = 10, color = "#3498db", alpha = 0.3) +
  geom_text(size = 3) +
  geom_segment(aes(x = 1, xend = 1, y = 1.2, yend = 1.8),
               arrow = arrow(length = unit(0.2, "cm"))) +
  geom_segment(aes(x = 3, xend = 3, y = 1.2, yend = 1.8),
               arrow = arrow(length = unit(0.2, "cm"))) +
  geom_segment(aes(x = 1, xend = 1.8, y = 2.2, yend = 2.8),
               arrow = arrow(length = unit(0.2, "cm"))) +
  geom_segment(aes(x = 3, xend = 2.2, y = 2.2, yend = 2.8),
               arrow = arrow(length = unit(0.2, "cm"))) +
  geom_segment(aes(x = 2, xend = 2, y = 3.2, yend = 3.8),
               arrow = arrow(length = unit(0.2, "cm"))) +
  xlim(0, 4) + ylim(0.5, 4.5) +
  labs(title = "Hierarchical Panel VAR Structure",
       subtitle = "Information flows up (estimation) and down (shrinkage)") +
  theme_void()

Panel VAR structure for cross-country macro analysis

Gibbs Sampler for Panel VAR

Code
# Simplified Panel VAR Gibbs sampler
# For K=2 variables, p=1 lag, demonstration purposes

panel_var_gibbs <- function(Y_list, p = 1, n_draw = 3000, n_burn = 1000,
                             lambda1 = 0.2) {

  N <- length(Y_list)  # number of countries
  K <- ncol(Y_list[[1]])  # number of variables
  M <- K * p + 1  # regressors per equation

  # Build data matrices for each country
  data_list <- map(Y_list, function(Y) {
    T_full <- nrow(Y)
    y <- Y[(p+1):T_full, , drop = FALSE]
    X <- matrix(1, nrow = T_full - p, ncol = 1)
    for (j in 1:p) {
      X <- cbind(X, Y[(p+1-j):(T_full-j), , drop = FALSE])
    }
    list(y = y, X = X, T_eff = nrow(y))
  })

  # Hierarchical priors
  # Level 2: mu_B ~ N(0, 10*I), Omega ~ IW(I, K+2)
  mu_B_prior <- rep(0, M * K)
  V_mu_prior <- diag(10, M * K)
  Omega_prior <- diag(1, M * K)
  nu_Omega_prior <- M * K + 2

  # Storage
  B_draws <- array(0, dim = c(M, K, N, n_draw))
  mu_B_draws <- matrix(0, n_draw, M * K)

  # Initialize
  B_curr <- array(0, dim = c(M, K, N))
  for (i in 1:N) {
    B_curr[,,i] <- solve(crossprod(data_list[[i]]$X)) %*%
                   crossprod(data_list[[i]]$X, data_list[[i]]$y)
  }
  Sigma_curr <- lapply(1:N, function(i) {
    U <- data_list[[i]]$y - data_list[[i]]$X %*% B_curr[,,i]
    crossprod(U) / data_list[[i]]$T_eff
  })
  mu_B_curr <- apply(B_curr, c(1,2), mean) %>% as.vector()
  Omega_curr <- diag(lambda1^2, M * K)

  for (g in 1:(n_draw + n_burn)) {

    # Block 1: Draw B_i | Sigma_i, mu_B, Omega for each country
    for (i in 1:N) {
      X_i <- data_list[[i]]$X
      y_i <- data_list[[i]]$y
      Sigma_inv <- solve(Sigma_curr[[i]])
      Omega_inv <- solve(Omega_curr)

      V_post_inv <- Omega_inv + kronecker(Sigma_inv, crossprod(X_i))
      V_post <- solve(V_post_inv)
      b_post <- V_post %*% (Omega_inv %*% mu_B_curr +
                            kronecker(Sigma_inv, t(X_i)) %*% as.vector(y_i))
      B_curr[,,i] <- matrix(rmvnorm(1, b_post, V_post), M, K)
    }

    # Block 2: Draw Sigma_i | B_i for each country
    for (i in 1:N) {
      U <- data_list[[i]]$y - data_list[[i]]$X %*% B_curr[,,i]
      S_post <- diag(K) + crossprod(U)
      Sigma_curr[[i]] <- solve(rWishart(1, K + 2 + data_list[[i]]$T_eff,
                                         solve(S_post))[,,1])
    }

    # Block 3: Draw mu_B | B_1,...,B_N, Omega
    B_vec <- sapply(1:N, function(i) as.vector(B_curr[,,i]))
    B_mean <- rowMeans(B_vec)
    Omega_inv <- solve(Omega_curr)
    V_mu_inv <- solve(V_mu_prior)
    V_mu_post <- solve(N * Omega_inv + V_mu_inv)
    m_mu_post <- V_mu_post %*% (N * Omega_inv %*% B_mean + V_mu_inv %*% mu_B_prior)
    mu_B_curr <- as.vector(rmvnorm(1, m_mu_post, V_mu_post))

    # Block 4: Draw Omega | B_1,...,B_N, mu_B
    S_Omega <- Omega_prior
    for (i in 1:N) {
      b_i <- as.vector(B_curr[,,i])
      S_Omega <- S_Omega + outer(b_i - mu_B_curr, b_i - mu_B_curr)
    }
    Omega_curr <- solve(rWishart(1, nu_Omega_prior + N, solve(S_Omega))[,,1])

    # Store
    if (g > n_burn) {
      idx <- g - n_burn
      B_draws[,,,idx] <- B_curr
      mu_B_draws[idx,] <- mu_B_curr
    }

    if (g %% 500 == 0) cat("Draw", g, "of", n_draw + n_burn, "\n")
  }

  list(B = B_draws, mu_B = mu_B_draws, K = K, p = p, M = M, N = N)
}

# Generate synthetic panel VAR data
set.seed(123)
N_countries <- 10
T_obs <- 60
K <- 2

# Common VAR structure with heterogeneity
A_common <- matrix(c(0.7, 0.1, 0.05, 0.6), K, K)
Sigma_common <- matrix(c(1, 0.3, 0.3, 1), K, K)

Y_list <- map(1:N_countries, function(i) {
  # Country-specific deviation
  A_i <- A_common + matrix(rnorm(K^2, 0, 0.1), K, K)

  Y <- matrix(0, T_obs, K)
  Y[1, ] <- rnorm(K)
  for (t in 2:T_obs) {
    Y[t, ] <- Y[t-1, ] %*% t(A_i) + rmvnorm(1, sigma = Sigma_common)
  }
  Y
})

# Estimate panel VAR
cat("Estimating Panel VAR...\n")
Estimating Panel VAR...
Code
pvar_result <- panel_var_gibbs(Y_list, p = 1, n_draw = 2000, n_burn = 500)
Draw 500 of 2500 
Draw 1000 of 2500 
Draw 1500 of 2500 
Draw 2000 of 2500 
Draw 2500 of 2500 
Code
# Extract A[1,1] coefficient for each country
A11_draws <- pvar_result$B[2, 1, , ]  # position [2,1] is A[1,1] after constant

# Posterior summaries
A11_summary <- tibble(
  country = 1:N_countries,
  mean = apply(A11_draws, 1, mean),
  lower = apply(A11_draws, 1, quantile, 0.16),
  upper = apply(A11_draws, 1, quantile, 0.84)
)

# Add global mean
mu_A11 <- mean(pvar_result$mu_B[, 2])  # position 2 in vec(B) is A[1,1]

ggplot(A11_summary, aes(x = factor(country), y = mean)) +
  geom_pointrange(aes(ymin = lower, ymax = upper), color = "#3498db") +
  geom_hline(yintercept = mu_A11, linetype = "dashed", color = "#e74c3c") +
  geom_hline(yintercept = A_common[1,1], linetype = "dotted", color = "#2ecc71") +
  annotate("text", x = 9, y = mu_A11 + 0.05, label = "Hierarchical mean", color = "#e74c3c") +
  annotate("text", x = 9, y = A_common[1,1] - 0.05, label = "True common", color = "#2ecc71") +
  labs(title = "Country-Specific VAR Coefficients with Hierarchical Shrinkage",
       subtitle = expression(paste("Posterior of ", A[11], " across countries")),
       x = "Country", y = expression(A[11])) +
  theme_minimal()

Panel VAR coefficient posteriors across countries

Random Coefficients Models

Swamy (1970) Random Coefficients

The classic frequentist approach:

\[ y_{it} = x_{it}'\beta_i + \varepsilon_{it} \] \[ \beta_i = \bar{\beta} + v_i, \quad v_i \sim N(0, \Omega) \]

Bayesian version adds priors on \((\bar{\beta}, \Omega)\) and uses Gibbs sampling.

Correlated Random Coefficients

Allow \(\beta_i\) to correlate with observables:

\[ \beta_i = \bar{\beta} + Z_i'\gamma + v_i \]

where \(Z_i\) are country characteristics (GDP per capita, openness, institutions).

Application: Countries with higher institutional quality might have different Taylor rule coefficients.

Code
# Simulate correlated random coefficients
N <- 30
T_obs <- 40

# Country characteristic
z_i <- rnorm(N)  # e.g., log GDP per capita

# Coefficient depends on characteristic
gamma <- 0.3
beta_i <- 0.5 + gamma * z_i + rnorm(N, 0, 0.2)

# Generate panel
rc_data <- map_dfr(1:N, function(i) {
  x <- rnorm(T_obs)
  y <- beta_i[i] * x + rnorm(T_obs, 0, 1)
  tibble(country = i, x = x, y = y, z = z_i[i])
})

# OLS by country
ols_rc <- rc_data %>%
  group_by(country) %>%
  summarise(beta_ols = coef(lm(y ~ x - 1))[1], z = first(z))

# Plot relationship
ggplot(ols_rc, aes(x = z, y = beta_ols)) +
  geom_point(size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE, color = "#3498db") +
  geom_abline(intercept = 0.5, slope = gamma, linetype = "dashed", color = "#e74c3c") +
  annotate("text", x = 1.5, y = 0.5 + gamma * 1.5 + 0.1,
           label = "True relationship", color = "#e74c3c") +
  labs(title = "Correlated Random Coefficients",
       subtitle = "Country characteristic explains coefficient heterogeneity",
       x = "Country characteristic (z)", y = expression(hat(beta)[i])) +
  theme_minimal()

Random coefficients with observable heterogeneity

Cross-Sectional Dependence

The Problem

Standard panel methods assume: \[ E[\varepsilon_{it} \varepsilon_{js}] = 0 \quad \text{for } i \neq j \]

But in macro panels, global shocks (oil prices, US monetary policy, COVID) create cross-sectional dependence.

Factor Structure Approach

Model the error as: \[ \varepsilon_{it} = \lambda_i' f_t + e_{it} \]

where: - \(f_t\): Common factors (K_f × 1) - \(\lambda_i\): Country-specific factor loadings - \(e_{it}\): Idiosyncratic error

Interactive Fixed Effects (Bai 2009)

\[ y_{it} = x_{it}'\beta + \lambda_i' f_t + e_{it} \]

Jointly estimates \(\beta\), \(\Lambda = [\lambda_1, \ldots, \lambda_N]'\), and \(F = [f_1, \ldots, f_T]'\).

Bayesian Factor Model

Prior structure: \[ f_t | \Sigma_f \sim N(0, \Sigma_f), \quad \lambda_i \sim N(0, \Omega_\lambda) \]

Identification: Impose lower-triangular structure on loadings (first K_f countries).

Code
# Simulate panel with common factor
N <- 20
T_obs <- 50
K_f <- 1  # one common factor

# Common factor (e.g., global financial cycle)
f_t <- arima.sim(model = list(ar = 0.8), n = T_obs) %>% as.vector()

# Country loadings
lambda_i <- rnorm(N, mean = 0.5, sd = 0.3)

# Generate correlated errors
errors <- outer(f_t, lambda_i) + matrix(rnorm(T_obs * N, 0, 0.5), T_obs, N)

# True coefficient
beta_true <- 0.8

# Generate data
factor_data <- map_dfr(1:N, function(i) {
  x <- rnorm(T_obs)
  y <- beta_true * x + errors[, i]
  tibble(country = i, t = 1:T_obs, x = x, y = y, factor = f_t)
})

# Show cross-correlation of residuals
resids <- factor_data %>%
  group_by(country) %>%
  mutate(resid = residuals(lm(y ~ x))) %>%
  select(country, t, resid) %>%
  pivot_wider(names_from = country, values_from = resid, names_prefix = "c")

# Correlation matrix of residuals
resid_mat <- as.matrix(resids[, -1])
cor_mat <- cor(resid_mat)

# Visualize
cor_df <- expand_grid(i = 1:N, j = 1:N) %>%
  mutate(correlation = as.vector(cor_mat))

ggplot(cor_df, aes(x = factor(i), y = factor(j), fill = correlation)) +
  geom_tile() +
  scale_fill_gradient2(low = "#3498db", mid = "white", high = "#e74c3c", midpoint = 0) +
  labs(title = "Cross-Sectional Correlation of Residuals",
       subtitle = "Common factor creates off-diagonal correlations",
       x = "Country i", y = "Country j") +
  theme_minimal() +
  theme(axis.text = element_text(size = 8))

Cross-sectional dependence via common factor

Bayesian Factor-Augmented Panel VAR

Combine hierarchical panel VAR with factor structure:

\[ y_{it} = c_i + A_i y_{i,t-1} + \Gamma_i f_t + u_{it} \]

where \(f_t\) is a vector of global factors (extracted or observed).

Application in macro: Include Global Financial Cycle (GFC) factor from Miranda-Agrippino & Rey.

Python Implementation

For larger panels or more complex hierarchical structures, Python with NumPy/SciPy provides flexibility.

# Hierarchical Panel Model in Python
import numpy as np
from scipy import stats
from scipy.linalg import solve, cholesky

def hierarchical_panel_gibbs(y_list, X_list, n_draw=5000, n_burn=1000):
    """
    Gibbs sampler for hierarchical panel regression.

    y_list: list of (T_i,) arrays - outcomes by country
    X_list: list of (T_i, K) arrays - regressors by country
    """
    N = len(y_list)
    K = X_list[0].shape[1]

    # Priors
    mu0 = np.zeros(K)
    V0 = np.eye(K) * 10
    V0_inv = np.linalg.inv(V0)
    Omega0 = np.eye(K)
    nu0 = K + 2
    a0, b0 = 0.01, 0.01  # for sigma^2

    # Storage
    beta_draws = np.zeros((n_draw, N, K))
    mu_draws = np.zeros((n_draw, K))
    Omega_draws = np.zeros((n_draw, K, K))

    # Initialize
    beta_curr = np.zeros((N, K))
    for i in range(N):
        beta_curr[i] = np.linalg.lstsq(X_list[i], y_list[i], rcond=None)[0]
    sigma2_curr = np.ones(N)
    mu_curr = beta_curr.mean(axis=0)
    Omega_curr = np.cov(beta_curr.T) + 0.01 * np.eye(K)

    for g in range(n_draw + n_burn):
        Omega_inv = np.linalg.inv(Omega_curr)

        # Block 1: Draw beta_i | rest
        for i in range(N):
            X_i, y_i = X_list[i], y_list[i]
            XtX = X_i.T @ X_i
            Xty = X_i.T @ y_i

            V_post_inv = XtX / sigma2_curr[i] + Omega_inv
            V_post = np.linalg.inv(V_post_inv)
            m_post = V_post @ (Xty / sigma2_curr[i] + Omega_inv @ mu_curr)

            beta_curr[i] = np.random.multivariate_normal(m_post, V_post)

        # Block 2: Draw sigma2_i | rest
        for i in range(N):
            resid = y_list[i] - X_list[i] @ beta_curr[i]
            T_i = len(y_list[i])
            a_post = a0 + T_i / 2
            b_post = b0 + np.sum(resid**2) / 2
            sigma2_curr[i] = 1 / np.random.gamma(a_post, 1/b_post)

        # Block 3: Draw mu | rest
        V_mu_post = np.linalg.inv(N * Omega_inv + V0_inv)
        m_mu_post = V_mu_post @ (N * Omega_inv @ beta_curr.mean(axis=0) +
                                  V0_inv @ mu0)
        mu_curr = np.random.multivariate_normal(m_mu_post, V_mu_post)

        # Block 4: Draw Omega | rest (Inverse Wishart)
        S_post = Omega0.copy()
        for i in range(N):
            diff = beta_curr[i] - mu_curr
            S_post += np.outer(diff, diff)

        # Draw from Inverse Wishart
        df_post = nu0 + N
        Omega_curr = stats.invwishart.rvs(df=df_post, scale=S_post)

        # Store
        if g >= n_burn:
            idx = g - n_burn
            beta_draws[idx] = beta_curr
            mu_draws[idx] = mu_curr
            Omega_draws[idx] = Omega_curr

    return {
        'beta': beta_draws,
        'mu': mu_draws,
        'Omega': Omega_draws
    }

# Example usage:
# result = hierarchical_panel_gibbs(y_list, X_list)
# beta_posterior_mean = result['beta'].mean(axis=0)

Model Comparison for Panel Models

Marginal Likelihood

For hierarchical models, the marginal likelihood integrates over all parameters:

\[ p(Y | M) = \int p(Y | \theta) p(\theta | M) d\theta \]

For linear hierarchical models: Often tractable analytically.

For complex models: Use bridge sampling or the Chib (1995) method.

DIC and WAIC

Criterion Formula Use
DIC \(\bar{D} + p_D\) Effective parameters penalty
WAIC Pointwise predictive density Better for hierarchical models
LOO-CV Leave-one-out cross-validation Gold standard but expensive
Code
# DIC calculation for hierarchical model
calculate_dic <- function(hier_result, data_hier) {

  countries <- unique(data_hier$country)
  N <- length(countries)
  n_draw <- dim(hier_result$beta)[3]

  # Compute log-likelihood at each draw
  log_lik_draws <- numeric(n_draw)

  for (g in 1:n_draw) {
    ll <- 0
    for (i in 1:N) {
      d <- filter(data_hier, country == i)
      beta_g <- hier_result$beta[i, 1, g]
      sigma2_g <- hier_result$sigma2[g, i]

      resid <- d$y - d$x * beta_g
      ll <- ll + sum(dnorm(resid, 0, sqrt(sigma2_g), log = TRUE))
    }
    log_lik_draws[g] <- ll
  }

  # DIC components
  D_bar <- -2 * mean(log_lik_draws)  # posterior mean deviance

  # Deviance at posterior mean
  ll_at_mean <- 0
  for (i in 1:N) {
    d <- filter(data_hier, country == i)
    beta_mean <- mean(hier_result$beta[i, 1, ])
    sigma2_mean <- mean(hier_result$sigma2[, i])
    resid <- d$y - d$x * beta_mean
    ll_at_mean <- ll_at_mean + sum(dnorm(resid, 0, sqrt(sigma2_mean), log = TRUE))
  }
  D_theta_bar <- -2 * ll_at_mean

  # Effective parameters
  p_D <- D_bar - D_theta_bar

  # DIC
  DIC <- D_bar + p_D

  list(DIC = DIC, D_bar = D_bar, p_D = p_D)
}

dic_result <- calculate_dic(hier_result, data_hier)
cat("DIC:", round(dic_result$DIC, 1), "\n")
DIC: 2873.5 
Code
cat("Effective parameters (p_D):", round(dic_result$p_D, 1), "\n")
Effective parameters (p_D): 36.9 

Practical Guidance

When to Use Hierarchical vs. Standard Panel

Situation Recommendation
Small T per country (< 30) Hierarchical (shrinkage helps)
Large N, moderate T Hierarchical (efficient)
Focus on average effect Either works
Focus on heterogeneity Hierarchical (full posterior)
Strong prior beliefs Bayesian hierarchical

Implementation Checklist

  1. Check MCMC convergence: Trace plots, R-hat, effective sample size
  2. Assess shrinkage: Compare hierarchical to OLS estimates
  3. Test homogeneity: Is \(\Omega\) small relative to coefficient magnitudes?
  4. Cross-sectional dependence: Test residuals, consider factors
  5. Model comparison: DIC/WAIC across specifications

Common Pitfalls

WarningWatch Out For
  1. Over-shrinkage: Very tight hyperpriors force all countries to look identical
  2. Under-shrinkage: Vague hyperpriors fail to borrow strength
  3. Ignoring cross-sectional dependence: Biases standard errors downward
  4. Too many parameters: K-variable panel VAR with p lags has \(N \times K \times (Kp + 1)\) parameters
  5. Improper mixing: Hierarchical models can have slow mixing — increase draws

Key References

Hierarchical Panel Models

  • Gelman & Hill (2007) Data Analysis Using Regression and Multilevel/Hierarchical Models — Accessible introduction
  • Swamy (1970) “Efficient Inference in a Random Coefficient Regression Model” Econometrica — Classic random coefficients
  • Hsiao (2014) Analysis of Panel Data — Comprehensive textbook

Bayesian Panel VAR

  • Canova & Ciccarelli (2009) “Estimating Multicountry VAR Models” IER — Foundational paper
  • Canova & Ciccarelli (2013) “Panel Vector Autoregressive Models: A Survey” — Survey chapter
  • Jarocinski (2010) “Responses to Monetary Policy Shocks in the East and the West of Europe” JAE — Application

Cross-Sectional Dependence

  • Bai (2009) “Panel Data Models with Interactive Fixed Effects” Econometrica — Factor structure
  • Pesaran (2006) “Estimation and Inference in Large Heterogeneous Panels” Econometrica — CCE estimator
  • Chudik & Pesaran (2015) “Common Correlated Effects Estimation of Heterogeneous Dynamic Panel Data Models” JoE

Software

  • R: brms (Bürkner), lme4 (Bates et al.), custom Gibbs samplers
  • Python: PyMC, NumPy/SciPy for manual MCMC
  • Stan: rstan/pystan for complex hierarchical models

Summary

Concept Key Insight
Hierarchical models Pool information while allowing heterogeneity
Shrinkage Data-poor countries borrow from data-rich
Panel VAR Country-specific dynamics with common structure
Cross-sectional dependence Global factors create correlations
Gibbs sampling Natural framework for hierarchical models
TipFor Your Research

With N = 46 countries and T = 84 quarters: - Hierarchical shrinkage is valuable for EM countries with noisier data - Panel VAR can reveal heterogeneity in policy transmission - Factor structures capture global shocks (GFC, COVID) - Always check cross-sectional correlation in residuals