Start from one concrete question

What changes if we remove one example, swap in a different subset, or let a group of contributors walk away?

Use the same visual language throughout

The memo, glossary, paper lists, and explorer all point back to the same grid metaphor instead of inventing fresh jargon on each page.

Move from intuition to literature

Once the idea clicks, you can jump directly into curated references for the subfields that matter most to you.

The basic move

Compare nearby worlds, not vague abstractions

The easiest counterfactual is leave-one-out: compare a training set that includes one point with the same training set after that point is removed.

That sounds almost too simple, but it gets you surprisingly far. From there you can ask the same question about groups, subsets of a fixed size, corrupted examples, or withheld data.

  • Rows tell you what changed in training. Move between rows to simulate different data choices.
  • Columns tell you where the effect lands. A point may matter a lot for one evaluation slice and barely at all for another.
  • Differences are the real story. A single score is rarely interesting on its own; the comparison is what matters.

Leave-one-out example

ABCD versus ACD

Here the gold row trains with B and the teal row leaves B out. The biggest drop lands on evaluation point B, which is the intuition people often want in plain language.

A
B
C
D
ABCD
0.92
0.88
0.85
0.82
ACD
0.78
0.55
0.82
0.80
This is the same intuition behind influence estimates and many forms of data attribution: compare a baseline training world with a nearby altered one.

What this framing connects

Different literatures, same underlying move

The point of the project is not to pretend every subfield is identical. It is to show that many of them start from a shared operation: change the data, then compare the outcome.

Influence and attribution

Compare nearby rows to see which examples, or which groups of examples, are doing the work.

Scaling and selection

Scan the grid by subset size to ask when more data helps and which subsets punch above their weight.

Poisoning and robustness

Treat corrupted or withheld data as another counterfactual and inspect the damage directly.

Collective action

Ask what happens when many contributors walk away together, instead of pretending each point acts alone.

Why people care

Useful for debugging, bargaining, and model hygiene

Debug a model failure

When a prediction looks wrong, the framework gives you a concrete way to ask which training examples changed the outcome.

Value a dataset

Instead of talking about data value in the abstract, you can compare outcomes with and without a slice of the data.

Reason about leverage

People who contribute data are not just passive inputs. This framing makes their bargaining power legible.

Where to go next

Choose the depth you want

Open the explorer

Manipulate a toy dataset, trigger guided examples, and see counterfactual effects in real time.

Launch explorer

Read the memo

For the longer argument and the research framing behind the site, go straight to the core memo.

Read memo

Browse the literature

Jump into curated paper collections once you know which corner of the problem you want to dig into.

Open reading lists