What is a data counterfactual?

A data counterfactual is a scenario in which the data world around an AI system changes in some way. In the first version of this site, that mostly meant changes to training data. The next layer is broader: the evaluation set, permitted data uses, and the institutions that make a measurement trustworthy can also change. Often, we are interested in comparing two counterfactual scenarios to understand the impact of some change on AI capabilities, measurement, or confidence.

Consider this thought experiment: imagine you are going to train a machine learning model on a very small dataset: let's say the dataset has just four units of data (or, if it seems implausible that we'd ever want to do this, we can imagine it's a big dataset with distinct bundled subsets). Now imagine a grid where every possible combination of training objects appears as a row, every possible evaluation set appears as a column, and each cell records the performance for a given train/eval pairing. For our very small example with just four data objects, we can call them A, B, C, and D. (Again, these could literally map to four single observations in a toy example, or map to four large datasets we are considering mixing.)

With this grid in mind, we can explore the most basic useful data counterfactual, "leave-one-out." By comparing a row that includes one point with the nearby row in which that point is missing, we can understand the impact (in a causal sense) of adding or removing that point. By computing the difference between these two cells, we can learn how much a given data point helped or hurt our model. From there the same logic can be extended to groups of points, weighting data points, replacing data with other synthetic data, corrupting certain examples, or coordinated withdrawal.

Very simply, we can imagine training an LLM with a bunch of fiction books, science articles, and social media posts. If we train a second LLM without the science articles and compare the performance, we are exploring the "no science articles" data counterfactual. Researchers have indeed performed such experiments, for instance at non-profit institutions like AI2 and for-profit companies like Meta.

Leave-one-out toy example

A
B
C
D
ABCD
0.92
0.88
0.85
0.82
ACD
0.78
0.55
0.82
0.80
With four toy data objects A, B, C, and D, the lower row leaves out B. The sharp drop on evaluation slice B is the kind of local contrast many attribution methods try to summarize.

Training, evaluation, and trust counterfactuals

The grid also makes a second move visible. Most of the familiar examples are row moves:

$$ f(D_T, D_E) \rightarrow f(D_T \setminus z, D_E) $$

Here the evaluation target stays fixed while the training world changes. But we can also ask column moves:

$$ f(D_T, D_E) \rightarrow f(D_T, D_E \cup z) $$

Here the trained model stays fixed while the evaluation world changes. This is an evaluation counterfactual: the data object changes what we measure, which claims we trust, or which deployment decision we make.

And some questions are really institution moves:

$$ f(D_T, D_E, G) \rightarrow f(D_T, D_E, G') $$

Here $G$ stands for governance or trust state: provenance, licensing, evaluator independence, contamination controls, label process, secrecy, and other facts that decide whether a train/eval comparison should count. This does not replace the original training-data frame. It adds a second layer: the first version of the site focuses on changes to training data; the next layer asks what changes when the evaluation set, holdout institution, or permitted data use changes.

Why data counterfactuals are relevant to data leverage

This frame helps us connect topics that might seem distinct, for instance connecting influence estimation and Shapley values with data strikes and data contribution campaigns. In ML, we often want to ask questions about removing a point, reweighting a group, fitting a scaling curve, etc. with the purpose of understanding our data and model. But counterfactuals can also be induced by strategic actors. Strikes, boycotts, contribution campaigns, and bargaining efforts all try to impact AI through data.

When people can withhold, redirect, or condition the supply of data, data counterfactual measurement directly maps to governance power! In other words, the kind of experiments we'd want to run if we're just an ML researcher trying to make our model better (via data selection or other data-centric approaches) are the same experiments we'd want to run if we're trying to organize data-related collective action, design data dividend schemes, or set up an efficient data market. If we had a shared bank of results from such an experiment, those results would be useful to actors with a wide variety of interests and goals! Furthermore, this frame also makes it very clear where questions about provenance, licensing, contribution governance, and evaluation use rights directly determine which training rows and evaluation columns are legally, socially, or politically available in the first place.