Quantifying Statistical and Causal Importance with Shapley Values
Dr Thomas Keevers1
Machine learning can help us to diagnose diseases, recommend a movie to watch, or find cliques on social media. However, the statistical inference that underlies these technologies is often opaque: a collection of weights, decision paths, or support vectors. Sometimes we may be content with predictive accuracy, so this opaqueness is not a problem. On other occasions, we may need some insight into how a model works to confirm it will be robust in new environments, to ensure social justice, or to understand cause-and-effect.
Shapley values can bridge this gap. They can quantify how much each feature contributes to a predictive model, using a framework grounded in game theory. Current Shapley value formulations tend to focus on probabilistic inference: understanding how observing a feature changes our expectations about a subsequent label. They do not identify or use any structural relationships between the feature and label.
In this talk, we will show how Shapley values can be extended from probabilistic inference to causal inference by integrating them with structural causal models. Our formulation can be used to trace how causes propagate though graphical structures, and have a pleasing sum-over-paths rule for linear effects. Notably, our framework can be applied to the observed joint probability density function, and avoids reference to potentially non-nonsensical counterfactuals. This is useful for areas such as medical treatment, where we wish to trace the effect of interventions to clinically relevant outcomes.
Bio to come.