Glossary | Data Counterfactuals

Use this when you want a quick definition without leaving the thread of the argument. The entries are short on purpose: enough to keep the site legible without turning the glossary into a second, heavier memo.

These are working definitions for this project. They are meant to make the site easier to navigate, not to settle contested terminology across the entire literature.

Terms

Active learning

Choosing which data points to label next under a limited labeling budget. In grid language, it is a strategy for deciding which rows become available for training. Example link: Settles (2009).

Adversarial training

Training on adversarially perturbed examples so the model becomes more robust to those attacks at test time. In grid terms, it deliberately changes the training row to improve performance under hostile evaluation slices. Example link: Madry et al. (2018).

Backdoor attack

A type of data poisoning where a trigger pattern causes the model to misclassify inputs containing that trigger, while behaving normally otherwise. Example link: Gu et al. (2019).

Bargaining

Negotiation over data supply, access, compensation, or terms of use between data creators and AI operators. In this project, bargaining matters because it can change which training rows are feasible, affordable, or politically acceptable. Example link: Vincent et al. (2021).

Banzhaf value

Another rule for valuing data points by asking how much they help across many possible subsets. Compared with Shapley-style methods, it uses a simpler averaging rule over those subsets. Example link: Wang and Jia (2023).

Beta Shapley

A version of Data Shapley that puts more or less emphasis on different subset sizes. In grid terms, it changes which kinds of row comparisons matter most when you assign value to data. Example link: Kwon and Zou (2022).

Coreset

A small subset of training data that approximates training on the full dataset. The goal is to find a much smaller row in the grid that lands in roughly the same performance region. Example link: Sener and Savarese (2018).

Contribution campaign

Coordinated effort to add, label, or redirect data so model behavior shifts in a desired direction. It is the constructive counterpart to a data strike: a move toward more favorable rows rather than less favorable ones. Example link: Vincent et al. (2019).

Curriculum learning

Training on examples in a meaningful order, often from easy to hard. It changes how you move through the grid over time, not just which row you end on. Example link: Bengio et al. (2009).

Data augmentation

Creating synthetic variations of training data such as crops, rotations, or noise. It effectively adds nearby rows to the grid. Example link: Zhang et al. (2018).

Data cartography

Mapping examples by training dynamics such as confidence and variability. The resulting plots are often called data maps. They help surface easy, ambiguous, or potentially mislabeled regions of a dataset. Example link: Swayamdipta et al. (2020).

Data maps

Plots from data cartography that place examples according to training dynamics like confidence and variability. Example link: Swayamdipta et al. (2020).

Data counterfactual

A project-level umbrella term for “what if” questions about training data. What would happen to model performance if we trained on different data? The grid visualization is a teaching model for organizing those questions, not something we literally enumerate in real systems. This usage is inspired by the broader counterfactual tradition, but it is not meant as a claim that this phrase is already a standard textbook term in ML. Example links: the main memo and Pearl (2009).

Data dividends

Proposals to share some of the economic value created by AI or other data-driven systems back to the people whose data made those systems possible. In this project’s framing, data dividend claims usually depend on counterfactual questions about how much model behavior, performance, or revenue would change if certain people or groups contributed, withheld, or relicensed their data. Example link: Vincent et al. (2021).

Data leverage

The power that data creators have over AI systems by virtue of controlling training data. Performance drops from withholding or degrading data are one source of leverage, but coordination, substitutability, and bargaining position also matter. Example link: Vincent et al. (2021).

Data poisoning

Deliberately corrupting training data to cause targeted model failures. Expands the grid dramatically; every possible perturbation creates new rows with potentially different outcomes. Example link: Biggio et al. (2012).

Data Shapley

A method for assigning value to each training point based on its average marginal contribution across all possible subsets. Borrowed from cooperative game theory. Computationally expensive but theoretically principled. Example link: Ghorbani and Zou (2019).

Data strike

Coordinated withholding of data by creators to reduce model performance and exert leverage over AI operators. A strategic move to a less favorable row in the grid. Example link: Vincent et al. (2019).

Dataset distillation

Learning a tiny synthetic training set that produces similar downstream behavior to a much larger real dataset. In grid terms, it tries to replace a large row neighborhood with a compact synthetic stand-in. Example link: Wang et al. (2018).

Dataset condensation

Often used nearly interchangeably with dataset distillation: compressing a large dataset into a much smaller synthetic or carefully selected one that preserves downstream behavior as much as possible. Example link: Wang et al. (2018).

Differential privacy

A mathematical privacy guarantee saying that an algorithm’s output should not change much when any one data point is added or removed. In this project’s terms, it limits how distinguishable neighboring training rows are from the outside; it does not directly guarantee good utility or broad robustness. Example link: Dwork (2006).

Evaluation set

The data used to measure model performance. In the grid, each column represents one possible evaluation point or slice. Example link: the grid.

Experimental design

Choosing which data to collect, label, or test so the resulting evidence is maximally informative under a budget. It overlaps with active learning, but the emphasis is often on information gain, uncertainty reduction, or causal identification rather than only downstream accuracy. Example link: Settles (2009).

Forgetting event

A moment during training when an example flips from correctly classified to incorrectly classified. Many forgetting events can signal hard, noisy, or atypical data points. Example link: Toneva et al. (2019).

Influence function

A technique for estimating how much a single training example affects a model’s predictions, without retraining. Approximates what would happen if you removed or upweighted that point. Example link: Koh and Liang (2017).

Leave-one-out

The simplest data counterfactual: compare performance with and without a single data point. In the grid, this means comparing two rows that differ by exactly one point. Example link: Wang and Jia (2023).

Machine unlearning

Efficiently updating a model as if a data point was never in the training set, moving from one row to another without full retraining. Example link: Bourtoule et al. (2021).

Membership inference

Trying to determine whether a specific example was in a model’s training set by exploiting differences between seen and unseen data. Example link: Carlini et al. (2022).

Memorization

When a model retains unusually specific information about parts of its training data, sometimes including rare or sensitive details. Memorization can coexist with genuine generalization, but it raises the risk of extraction or privacy leakage. Example link: Carlini et al. (2021).

Representer point

A training example flagged as especially helpful for explaining a prediction. It plays a role similar to influence functions, but comes from a different mathematical route. Example link: Yeh et al. (2018).

Reweighing

A fairness-oriented preprocessing method that changes how much different examples or groups count during training. In grid language, it changes the row before learning starts. Example link: Kamiran and Calders (2012).

Scaling law

An empirical relationship describing how model performance changes with data size, model size, or compute. In the grid metaphor, data scaling laws look like summaries over rows grouped by size, but the broader literature is not limited to data alone. Example link: Kaplan et al. (2020).

Training set

The data used to train a model. In the grid, each row represents one possible training set. Example link: the grid.

TracIn

A training-data attribution method that estimates which examples were influential by tracking gradient similarity across training checkpoints. It is another way of asking which parts of the row most affected a prediction or outcome, but with a different approximation strategy from classical influence functions. Example link: Pruthi et al. (2020).