Authors: Pranoy Panda*, Siddharth Tandon*, Vineeth N Balasubramanian (* denotes equal contribution)

Conference: ICASSP 2024

In machine learning, fairly assigning credit is a fundamental challenge. When a model makes a prediction, how much did each feature contribute? When training a model, how valuable was each data point to the final performance? Shapley values, a concept from cooperative game theory, have become a standard for tackling this “credit assignment” problem.

However, traditional Shapley values have a key limitation: they treat all group sizes the same. They assume that the contribution of a feature, when added to a small group of other features, is just as important as when it’s added to a large group. This isn’t always a realistic assumption. In applications like identifying the most important pixels in an image (feature attribution) or finding the most valuable data points in a training set (data valuation), this uniform weighting can lead to less-than-intuitive results.

To address this, Weighted Shapley values were introduced. They are a generalization that allows us to assign different weights to subsets of different sizes, offering more flexibility and better alignment with our intuition in many real-world tasks.

But this flexibility comes at a steep price. Like their traditional counterparts, Weighted Shapley values are plagued by exponential computational costs, making them impractical for the high-dimensional datasets common in modern machine learning.

Our paper, FW-Shapley: Real-time Estimation of Weighted Shapley Values, tackles this computational bottleneck head-on. We introduce two key contributions that make estimating these powerful, nuanced values practical for real-time applications. Below, we elaborate on these contributions

Contribution 1: A New Formulation through Weighted Least Squares

To set the context, we operate in a cooperative game setting where each “player” $z$ is either one input feature (for the feature attribution task) or one training example (for data valuation), and a coalition $s$ is any subset of these players. We represent $s$ by a binary vector in ${0,1}^n$ indicating which players are included.

Our first key insight is that Weighted Shapley values - which normally requires summing over exponentially many subsets - can be recovered by solving a single weighted least‑squares problem (Proposition 2.1 in the paper [1]). Instead of explicitly enumerating every coalition, we only need to sample a handful and fit a model.

Concretely, for each player $z$ (a feature or data point), we solve

\[\psi(z) = \underset{\psi_z}{\arg\min} \mathbb{E}_{s\sim p(s)} \Bigl[\bigl(v_z(s)-v_z(\mathbf0) - s^\top \psi_z\bigr)^2\Bigr].\]

where,

  • $v_z(s)$: the value of coalition $s$ when player $z$ is present.
  • $v_z(\mathbf0)$: the “baseline” value with no players.
  • $s\in{0,1}^n$: a binary mask indicating which players are in the coalition.
  • $p(s)$: a custom sampling distribution that encodes our choice of emphasizing smaller or larger subsets as desired based on the task at hand.

By minimizing this expected squared error, we recover exactly the same $\psi(z)$ that the original weighted‐Shapley formula would produce. This reframing:

  1. Sidesteps combinatorial blow‑up. You only draw as many samples as you like from (p(s)).
  2. Lays the groundwork for learning. Since it’s a quadratic loss, we can now train a neural network to predict $\psi(z)$ via standard regression techniques.

Contribution 2: FW-Shapley for Real-Time Estimation

Building on this new formulation, we developed Fast Weighted Shapley (FW-Shapley), an amortized framework for computing Weighted Shapley values in real-time.

The core idea is to train a learned estimator - a neural network - to predict the Weighted Shapley values directly. Instead of running the expensive computation for every new prediction, you train the estimator once. After training, it can provide near-instantaneous estimations. This “amortized” approach is what makes real-time applications feasible.

A crucial aspect of our framework is how we train this estimator. Calculating the true Weighted Shapley values to use as a training target would defeat the purpose, as it would require the very exponential computation we’re trying to avoid.

This is where our first contribution becomes critical. By using the weighted least squares objective, we can train our estimator without ever needing to see the ground-truth Weighted Shapley values.

The Theoretical Guarantee: Minimizing the Right Thing

To ensure our learned estimator indeed converges to the true weighted shapley values we validated it theoretically. We proved that minimizing our proposed objective function during training is mathematically equivalent to minimizing the estimation error of the Weighted Shapley values (Proposition 2.2 in our paper [1]).

This theoretical result is the key to FW-Shapley’s success. It guarantees that even though we don’t have the “answers” (the true values) for the model to learn from, the learning process is still driving the model’s predictions toward those correct answers.

Putting FW-Shapley to the Test

We validated FW-Shapley’s effectiveness on two key tasks using three popular image datasets (CIFAR10, SVHN, and FMNIST).

  • Feature Attribution: When compared to FastSHAP, a leading estimator for Shapley values, FW-Shapley demonstrated a 27% average improvement in terms of Inclusion AUC, a key metric for this task.
  • Data Valuation: In the task of identifying noisy labels, FW-Shapley was 14 times faster than the state-of-the-art KNN-Shapley method while achieving comparable accuracy.

These results show that FW-Shapley is not just faster, but also more effective, providing a practical and powerful tool for researchers and practitioners.

Why It Matters

By creating a theoretically sound, fast, and effective estimation framework, FW-Shapley makes the use of the more flexible and intuitive Weighted Shapley values practical. This opens the door for fairer and more insightful credit assignment in high-dimensional, real-world machine learning applications where it was previously computationally out of reach.

Citation

If you find this work interesting and useful, please do cite this work as follows:

@inproceedings{panda2024fw,
  title={Fw-shapley: Real-time estimation of weighted shapley values},
  author={Panda, Pranoy and Tandon, Siddharth and Balasubramanian, Vineeth N},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6210--6214},
  year={2024},
  organization={IEEE}
}