Live Counterfactual Board

A working board for inspecting how training rosters and evaluation worlds change classifier behavior. Each tile runs a regularized logistic-regression classifier on standardized synthetic data, with scatter plots, a confusion matrix, and a 3D accuracy board.

15 train combos x 4 eval worlds

Data seed: 0

Board setup

Swap the toy explanations for A/B/C/D, regenerate synthetic data, and remind yourself how the 3D bars are scaled.

Bar height = 0.12 + (metric value x 1.4). Operator view hides unrevealed rows at height 0.06; affordable but unrevealed rows sit at 0.12 and glow amber. Revealed tiles match the real grid.

Metric focus

Switch the board to emphasize different classifier stats. Selection also updates the glowing tile.

Selected tile reports 48.8% accuracy.

Selection

Training rosterEvaluation world

Scenario puzzles

These presets live in a tiny markdown manifest. Loading one sets the metric, focus tile, budget, and which rows are already revealed.

Focus: metric recall, row AB, column beta. Budget = 48; pre-revealed: A, B.

Sensor drift threatens precision; find a robust coalition and keep recall above 0.8 without overspending.

Classifier stats

A vs Eval alpha

accuracy48.8%

precision0.0%

recall0.0%

f10.0%

train loss0.0124

iterations520

convergedno

L2 reg0.020

	Pred 0	Pred 1
True 0	82	0
True 1	86	0

Weights ~ [-4.38, 0.00, -0.00], standardized on the training roster. 42 training pts / 168 eval pts.

Data views

A training clouds

Training distribution

Eval alpha evaluation scatter

Eval scatter & decision boundary

Class 0Class 1Misclassified outline

Real world grid

True classifier stats across every training coalition (rows) and eval world (columns). Click any cube to focus the detailed stats + scatterplots.

Axis labels render directly in scene; taller cyan highlights = the currently inspected tile.

Operator knowledge

Operators only know the rows they've audited. Amber cubes show affordable reveals (8 credits each); click or use the buttons below.

0/15 rosters known / Budget 60 credits. Hidden rows stay at height 0.06; affordable rows rise to 0.12 until you reveal them.

Operator actions

Spend credits to value additional rosters and sync the operator board with reality. Costs mirror the simple mini-game: reveal = 8.

Budget: 60 credits

Add starting budget

AHidden

Wide, slow signals

BHidden

Fast telemetry

CHidden

Satellite uplinks

DHidden

Edge devices

A + BHidden

Wide, slow signals / Fast telemetry

A + CHidden

Wide, slow signals / Satellite uplinks

A + DHidden

Wide, slow signals / Edge devices

B + CHidden

Fast telemetry / Satellite uplinks

B + DHidden

Fast telemetry / Edge devices

C + DHidden

Satellite uplinks / Edge devices

A + B + CHidden

Wide, slow signals / Fast telemetry / Satellite uplinks

A + B + DHidden

Wide, slow signals / Fast telemetry / Edge devices

A + C + DHidden

Wide, slow signals / Satellite uplinks / Edge devices

B + C + DHidden

Fast telemetry / Satellite uplinks / Edge devices

A + B + C + DHidden

Wide, slow signals / Fast telemetry / Satellite uplinks / Edge devices

Event log

Operator budget initialized at 60 credits.

World briefings

Eval alpha: Baseline mixture - same distribution as training.

Eval beta: Sensor drift stretches X and introduces mild label noise.

Eval gamma: Northern skies: mostly C & D with a +Y shift.

Eval delta: Adversarial sweep bending trajectories and flipping more labels.