Start from one concrete question
What changes if we remove one example, swap in a different subset, or let a group of contributors walk away?
Most machine-learning discussions treat training data like a hidden blob in the background. This site does the opposite. It turns training data into something you can inspect, compare, remove, corrupt, and reason about directly.
The core move is simple: ask a concrete what-if question about the data, then trace how model behavior changes. That one move links influence functions, scaling laws, data valuation, poisoning, and collective action without flattening them into buzzwords.
What changes if we remove one example, swap in a different subset, or let a group of contributors walk away?
The memo, glossary, paper lists, and explorer all point back to the same grid metaphor instead of inventing fresh jargon on each page.
Once the idea clicks, you can jump directly into curated references for the subfields that matter most to you.
The basic move
The easiest counterfactual is leave-one-out: compare a training set that includes one point with the same training set after that point is removed.
That sounds almost too simple, but it gets you surprisingly far. From there you can ask the same question about groups, subsets of a fixed size, corrupted examples, or withheld data.
Leave-one-out example
Here the gold row trains with B and the teal row leaves B out. The biggest drop lands on evaluation point B, which is the intuition people often want in plain language.
What this framing connects
The point of the project is not to pretend every subfield is identical. It is to show that many of them start from a shared operation: change the data, then compare the outcome.
Compare nearby rows to see which examples, or which groups of examples, are doing the work.
Scan the grid by subset size to ask when more data helps and which subsets punch above their weight.
Treat corrupted or withheld data as another counterfactual and inspect the damage directly.
Ask what happens when many contributors walk away together, instead of pretending each point acts alone.
Why people care
When a prediction looks wrong, the framework gives you a concrete way to ask which training examples changed the outcome.
Instead of talking about data value in the abstract, you can compare outcomes with and without a slice of the data.
People who contribute data are not just passive inputs. This framing makes their bargaining power legible.
Where to go next