Live Counterfactual Board
Advanced prototype: every tile now runs a regularized logistic-regression classifier on standardized synthetic data. Pick a training roster (rows) against evaluation worlds (columns), then inspect the actual scatter plots, confusion matrix, and a 3D accuracy board powered by Three.js.
Board setup
Swap the toy explanations for A/B/C/D, regenerate synthetic data, and remind yourself how the 3D bars are scaled.
Bar height = 0.12 + (metric value x 1.4). Operator view hides unrevealed rows at height 0.06; affordable but unrevealed rows sit at 0.12 and glow amber. Revealed tiles match the real grid.
Metric focus
Switch the board to emphasize different classifier stats. Selection also updates the glowing tile.
Selected tile reports 48.8% accuracy.
Selection
Scenario puzzles
These presets live in a tiny markdown manifest. Loading one sets the metric, focus tile, budget, and which rows are already revealed.
Focus: metric recall, row AB, column beta. Budget = 48; pre-revealed: A, B.
Sensor drift threatens precision; find a robust coalition and keep recall above 0.8 without overspending.
Classifier stats
A vs Eval alpha
| Pred 0 | Pred 1 | |
|---|---|---|
| True 0 | 82 | 0 |
| True 1 | 86 | 0 |
Weights ~ [-4.38, 0.00, -0.00], standardized on the training roster. 42 training pts / 168 eval pts.
Data views
A training clouds
Eval alpha evaluation scatter
Real world grid
True classifier stats across every training coalition (rows) and eval world (columns). Click any cube to focus the detailed stats + scatterplots.
Axis labels render directly in scene; taller cyan highlights = the currently inspected tile.
Operator knowledge
Operators only know the rows they've audited. Amber cubes show affordable reveals (8 credits each); click or use the buttons below.
0/15 rosters known / Budget 60 credits. Hidden rows stay at height 0.06; affordable rows rise to 0.12 until you reveal them.
Operator actions
Spend credits to value additional rosters and sync the operator board with reality. Costs mirror the simple mini-game: reveal = 8.
- Operator budget initialized at 60 credits.
World briefings
Eval alpha: Baseline mixture - same distribution as training.
Eval beta: Sensor drift stretches X and introduces mild label noise.
Eval gamma: Northern skies: mostly C & D with a +Y shift.
Eval delta: Adversarial sweep bending trajectories and flipping more labels.