class: center, middle, title-slide # FLAME: Interpretable Matching for Causal Inference ### Vittorio Orlandi ### The AME Lab @ Duke University ### 07/09/2021 --- exclude: true class: center, middle ## FLAME: Fast, interpretable, and accurate methods for estimating causal effects from observational data --- class: inverse, middle, center, hide-logo # The Methodology --- # Causal Inference Goal: quantify effect of treatment `\(T \in \{0, 1\}\)` on outcome `\(Y\)` -- Two potential outcomes for each unit: `\(\{Y_i(1), Y_i(0)\}\)`, denoting response under treatment, control -- - Only one -- denoted `\(Y_i\)` -- is actually observed -- - The other, the *counterfactual*, must be estimated from the data -- In observational data, covariates `\(\mathbf{X} \in \mathbb{R}^p\)` might be confounders ??? Can't naively compare the average outcome of control units to estimate a control counterfactual --- # (Exact) Matching One approach to estimating counterfactuals under confounding: *matching* -- If, for treated unit `\(i\)`, there existed control unit `\(k\)` such that `\(\mathbf{x}_k = \mathbf{x}_i\)`, then: -- - `\(Y_k\)` (observed) is a good estimate of `\(Y_i(0)\)` (unobserved) -- But exact matches are unlikely in high dimensional settings -- - Settle for `\(\mathbf{x}_k \approx \mathbf{x}_i\)` --- # Almost Matching Exactly Given a unit `\(i\)`, covariate weights `\(\mathbf{w}\)`, and a covariate selection vector `\(\boldsymbol{\theta}\)`, define the AME problem: `$$\overbrace{\text{argmax}_{\boldsymbol{\theta} \in \{0, 1\}^p}\;\boldsymbol{\theta}^T\mathbf{w}}^{\text{most important covariate set}}\quad\text{s.t.}\\\quad \exists k\;\:\text{with}\;\: \color{blue}{\underbrace{\mathbf{x}_{k} \circ \boldsymbol{\theta} = \mathbf{x}_{i} \circ \boldsymbol{\theta}}_{\text{exact matching on }\boldsymbol{\theta}}} \;\:\text{and}\;\: \underbrace{T_{k} = 1 -T_i}_{\text{opposite treatment}}$$` --- # Almost Matching Exactly Given a unit `\(i\)`, covariate weights `\(\mathbf{w}\)`, and a covariate selection vector `\(\boldsymbol{\theta}\)`, define the AME problem: `$$\overbrace{\text{argmax}_{\boldsymbol{\theta} \in \{0, 1\}^p}\;\boldsymbol{\theta}^T\mathbf{w}}^{\text{most important covariate set}}\quad\text{s.t.}\\\quad \exists k\;\:\text{with}\;\: \underbrace{\mathbf{x}_{k} \circ \boldsymbol{\theta} = \mathbf{x}_{i} \circ \boldsymbol{\theta}}_{\text{exact matching on }\boldsymbol{\theta}} \;\:\text{and}\;\: \color{blue}{\underbrace{T_{k} = 1 -T_i}_{\text{opposite treatment}}}$$` --- # Almost Matching Exactly Given a unit `\(i\)`, covariate weights `\(\mathbf{w}\)`, and a covariate selection vector `\(\boldsymbol{\theta}\)`, define the AME problem: `$$\color{blue}{\overbrace{\text{argmax}_{\boldsymbol{\theta} \in \{0, 1\}^p}\;\boldsymbol{\theta}^T\mathbf{w}}^{\text{most important covariate set}}}\quad\text{s.t.}\\\quad \exists k\;\:\text{with}\;\: \underbrace{\mathbf{x}_{k} \circ \boldsymbol{\theta} = \mathbf{x}_{i} \circ \boldsymbol{\theta}}_{\text{exact matching on }\boldsymbol{\theta}} \;\:\text{and}\;\: \underbrace{T_{k} = 1 -T_i}_{\text{opposite treatment}}$$` -- Implicitly defines a distance metric that: 1. Prioritizes matches on relevant covariates -- 2. Matches exactly when possible -- Iterate over covariate sets, starting with more important ones -- exclude: true In practice, don't have `\(\mathbf{w}\)`; run ML algorithm on _separate_ **holdout** set - Compute _Predictive Error_ ( `\(\mathtt{PE}\)` ): error in using a covariate set to predict the outcome - Determines next covariate set to match on - Learning a distance metric - test ??? Going to try and solve the AME problem for units. Way this is going to work in practice is that we're going to pick a theta, starting with a theta of all 1s, which corresponds to exact matching -- the best possible thing we can do -- and match all possible units. Then we're going to choose another theta, and match those units. Bc in practice we don't have fixed covariate weights, for each of these thetas, .. --- # Almost Matching Exactly: The Algorithms -- ## DAME (Dynamic Almost Matching Exactly) -- Solves the AME problem *exactly* for each unit -- Efficient solution via _downward closure_ property -- ## FLAME (Fast, Large-scale Almost Matching Exactly) -- Approximates the exact solution via backwards stepwise selection. -- At each iteration, eliminate an entire covariate ??? Given all this background, it's now very natural and easy to explain two of our methods --- # Almost Matching Exactly: Dynamic Weights In practice, don't have `\(\mathbf{w}\)`; run ML algorithm on _separate_ **holdout** set -- Compute _Predictive Error_ ( `\(\mathtt{PE}\)` ): error using covariate set to predict outcome -- Determines next covariate set to match on --- exclude: true # Almost Matching Exactly: Dynamic Weights Oftentimes don't have a priori measures of covariate importance -- exclude: true At every iteration, run ML algorithm on _separate_ **holdout** set to model how well a covariate set predicts the outcome -- exclude: true The _Predictive Error_ ( `\(\mathtt{PE}\)` ) measures the error in doing so and determines what covariate set next to match on. --- exclude: true # Other Distance Metrics - Propensity score matching: match on estimates of `\(\mathrm{P}(T_i = 1 | \mathbf{X} = \mathbf{x}_i)\)` - Prognostic score matching: match on estimates of `\(Y_i(0)\)` - Coarsened exact matching: Coarsen covariates and do exact matching --- class: inverse, middle, center, hide-logo # The Package --- # Overview of `FLAME` `FLAME` and `DAME` are the workhorses of the package Match input data under a wide variety of specifications Efficient bit-vectors routine for making matches Return S3 objects of class `ame` with `print`, `plot`, and `summary` methods --- # Installation CRAN ```r install.packages('FLAME') ``` GitHub ```r library(devtools) install_github('https://github.com/vittorioorlandi/FLAME') # Or (mirror of the above) install_github('https://github.com/almost-matching-exactly/R-FLAME') ``` --- # Natality Data US 2010 Natality Data (National Center for Health Statistics, NCHS, 2010). Data on neonatal health outcomes in Neonatal Intensive Care Unit (NICU) Effect of "extreme smoking" ( `\(\geq 10\)` cigarettes a day during pregnancy) on birth weight (Kondracki, 2020). Subset of ~500k observations with 16 covariates including sex of infant, races of parents, previous Cesarean deliveries, and others. --- # Missing Data `missing_data`: how missing values in `data` to be matched are handled -- - _drop_: effectively drop units with missingness from the data -- - _impute_: impute missing values and match on complete dataset -- - _keep_: keep missing values but do not match on them -- `missing_holdout` is analogous, with _impute_ and _keep_ options --- # Computing Predictive Error Two implemented options for computing `\(\mathtt{PE}\)` - `glmnet::cv.glmnet` with 5-fold cross-validation (default) - `xgboost::xgb.cv` with 5-fold cross-validation Supply your own function: ```r my_PE_lm <- function(X, Y) { df <- as.data.frame(cbind(X, Y = Y)) return(lm(Y ~ ., df)$fitted.values) } ``` --- # Calling FLAME and DAME Full call to use FLAME to match natality data: ```r my_PE_lm <- function(X, Y) { df <- as.data.frame(cbind(X, Y = Y)) return(lm(Y ~ ., df)$fitted.values) } natality_out <- FLAME(data = natality, holdout = 0.25, replace = FALSE, treated_column_name = 'smokes10', outcome_column_name = 'dbwt' missing_data = 'drop', missing_holdout = 'drop', PE_method = my_PE_lm, estimate_CATEs = TRUE) ``` --- # Calling FLAME and DAME Full call to use FLAME to match natality data: ```r my_PE_lm <- function(X, Y) { df <- as.data.frame(cbind(X, Y = Y)) return(lm(Y ~ ., df)$fitted.values) } natality_out <- * FLAME(data = natality, holdout = 0.25, replace = FALSE, treated_column_name = 'smokes10', outcome_column_name = 'dbwt' missing_data = 'drop', missing_holdout = 'drop', PE_method = my_PE_lm, estimate_CATEs = TRUE) ``` --- # Calling FLAME and DAME Full call to use FLAME to match natality data: ```r my_PE_lm <- function(X, Y) { df <- as.data.frame(cbind(X, Y = Y)) return(lm(Y ~ ., df)$fitted.values) } natality_out <- FLAME(data = natality, holdout = 0.25, replace = FALSE, * treated_column_name = 'smokes10', * outcome_column_name = 'dbwt' missing_data = 'drop', missing_holdout = 'drop', PE_method = my_PE_lm, estimate_CATEs = TRUE) ``` --- # Calling FLAME and DAME Full call to use FLAME to match natality data: ```r my_PE_lm <- function(X, Y) { df <- as.data.frame(cbind(X, Y = Y)) return(lm(Y ~ ., df)$fitted.values) } natality_out <- FLAME(data = natality, holdout = 0.25, replace = FALSE, treated_column_name = 'smokes10', outcome_column_name = 'dbwt' * missing_data = 'drop', missing_holdout = 'drop', PE_method = my_PE_lm, estimate_CATEs = TRUE) ``` --- # Calling FLAME and DAME Full call to use FLAME to match natality data: ```r my_PE_lm <- function(X, Y) { df <- as.data.frame(cbind(X, Y = Y)) return(lm(Y ~ ., df)$fitted.values) } natality_out <- FLAME(data = natality, holdout = 0.25, replace = FALSE, treated_column_name = 'smokes10', outcome_column_name = 'dbwt' missing_data = 'drop', missing_holdout = 'drop', * PE_method = my_PE_lm estimate_CATEs = TRUE) ``` --- # Calling FLAME and DAME Full call to use FLAME to match natality data: ```r my_PE_lm <- function(X, Y) { df <- as.data.frame(cbind(X, Y = Y)) return(lm(Y ~ ., df)$fitted.values) } natality_out <- FLAME(data = natality, holdout = 0.25, replace = FALSE, treated_column_name = 'smokes10', outcome_column_name = 'dbwt' missing_data = 'drop', missing_holdout = 'drop', PE_method = my_PE_lm * estimate_CATEs = TRUE) ``` ??? Estimate of the treatment effect for units that share certain covariate values --- # Methods: Print ```r print(natality_out, linewidth = 60, digits = 1) ``` ``` ## An object of class `ame`: ## FLAME ran for 16 iterations, matching 33504 out of 357535 ## units without replacement. ## The average treatment effect of treatment `smokes10` on ## outcome `dbwt` is estimated to be -175.7. ## Units with missingness in the matching data were not ## matched. ## Units with missingness in the holdout data were not used ## to compute PE. ``` --- # Methods: Plot ```r plot(natality_out, which_plots = 1) ``` <!-- --> --- # Methods: Plot ```r plot(natality_out, which_plots = 2) ``` <!-- --> --- # Methods: Plot ```r plot(natality_out, which_plots = 3) ``` <!-- --> --- # Methods: Plot ```r plot(natality_out, which_plots = 4) ``` <!-- --> --- # Methods: Summary ```r (natality_summ <- summary(natality_out)) ``` ``` ## Number of Units: ## Control Treated ## All 355373 2162 ## Matched 31885 1619 ## Unmatched 323488 543 ## ## Average Treatment Effects: ## Mean Variance ## All -176 671 ## Treated -233 178 ## Control -173 733 ## ## Matched Groups: ## Number 1545 ## Median size 5 ## Highest quality: 880 and 3287 ``` --- # Examining Matched Groups <style type="text/css"> pre { background: #FFFFFF; max-width: 100%; overflow-x: scroll; } </style> ```r high_quality_MG <- MG(natality_summ$MG$highest_quality[1], natality_out)[[1]] head(high_quality_MG, n = 14) ``` ``` ## fagerec11 meduc dob_wk mager9 fbrace mar rf_diab rf_cesar tbo_rec rf_phyp urf_chyper lbo_rec rf_ppterm sex mbrace bfacil3 smokes10 dbwt ## 880 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 2945 ## 3946 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 3215 ## 3954 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 2890 ## 6054 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 3623 ## 6979 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 3912 ## 9748 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 3040 ## 10961 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 3617 ## 14499 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 2866 ## 16632 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 3390 ## 22234 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 3345 ## 23142 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 3190 ## 24825 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 3730 ## 28503 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 3410 ## 31610 2 2 3 2 1 2 N N 1 N 2 1 N M 1 1 FALSE 2760 ``` --- # Conclusion FLAME and DAME are scalable algorithms for observational causal inference Use ML on a holdout set to learn a distance metric that prioritizes matches on more important covariates Resulting matched groups are interpretable and high quality Future work: - Database implementation - Algorithms for mixed data --- class: middle, center <h1 style="font-size:8vw">Thank You!</h1> .center[<img src="AME-QR.png" style="width:250px;height:250px;">] [almost-matching-exactly.github.io](https://almost-matching-exactly.github.io) ??? Documentation, links to papers, Python package --- # References Dieng, A., Y. Liu, S. Roy, et al. (2019). "Interpretable Almost-Exact Matching for Causal Inference". In: _The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan_. , pp. 2445-2453. Kondracki, A. J. (2020). "Low birthweight in term singletons mediates the association between maternal smoking intensity exposure status and immediate neonatal intensive care unit admission: the E-value assessment". In: _BMC Pregnancy and Childbirth_ 20, pp. 1-9. National Center for Health Statistics, NCHS (2010). _User Guide to the 2010 Natality Public Use File_. Centers for Disease Control and Prevention (CDC). Wang, T., M. Morucci, M. U. Awan, et al. (2020). "FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference". In: _Journal of Machine Learning Research_ 21, pp. 1-41. URL: [https://arxiv.org/abs/1707.06315](https://arxiv.org/abs/1707.06315).