Live Counterfactual Board

Advanced prototype: every tile now runs a regularized logistic-regression classifier on standardized synthetic data. Pick a training roster (rows) against evaluation worlds (columns), then inspect the actual scatter plots, confusion matrix, and a 3D accuracy board powered by Three.js.

15 train combos x 4 eval worlds
Data seed: 0

Board setup

Swap the toy explanations for A/B/C/D, regenerate synthetic data, and remind yourself how the 3D bars are scaled.

Bar height = 0.12 + (metric value x 1.4). Operator view hides unrevealed rows at height 0.06; affordable but unrevealed rows sit at 0.12 and glow amber. Revealed tiles match the real grid.

Metric focus

Switch the board to emphasize different classifier stats. Selection also updates the glowing tile.

Selected tile reports 48.8% accuracy.

Selection

Scenario puzzles

These presets live in a tiny markdown manifest. Loading one sets the metric, focus tile, budget, and which rows are already revealed.

Focus: metric recall, row AB, column beta. Budget = 48; pre-revealed: A, B.

Sensor drift threatens precision; find a robust coalition and keep recall above 0.8 without overspending.

Classifier stats

A vs Eval alpha

accuracy48.8%
precision0.0%
recall0.0%
f10.0%
train loss0.0124
iterations520
convergedno
L2 reg0.020
Pred 0Pred 1
True 0820
True 1860

Weights ~ [-4.38, 0.00, -0.00], standardized on the training roster. 42 training pts / 168 eval pts.

Data views

A training clouds

Training distribution

Eval alpha evaluation scatter

Eval scatter & decision boundary
Class 0Class 1Misclassified outline

Real world grid

True classifier stats across every training coalition (rows) and eval world (columns). Click any cube to focus the detailed stats + scatterplots.

Axis labels render directly in scene; taller cyan highlights = the currently inspected tile.

Operator knowledge

Operators only know the rows they've audited. Amber cubes show affordable reveals (8 credits each); click or use the buttons below.

0/15 rosters known / Budget 60 credits. Hidden rows stay at height 0.06; affordable rows rise to 0.12 until you reveal them.

Operator actions

Spend credits to value additional rosters and sync the operator board with reality. Costs mirror the simple mini-game: reveal = 8.

Budget: 60 credits
AHidden
Wide, slow signals
BHidden
Fast telemetry
CHidden
Satellite uplinks
DHidden
Edge devices
A + BHidden
Wide, slow signals / Fast telemetry
A + CHidden
Wide, slow signals / Satellite uplinks
A + DHidden
Wide, slow signals / Edge devices
B + CHidden
Fast telemetry / Satellite uplinks
B + DHidden
Fast telemetry / Edge devices
C + DHidden
Satellite uplinks / Edge devices
A + B + CHidden
Wide, slow signals / Fast telemetry / Satellite uplinks
A + B + DHidden
Wide, slow signals / Fast telemetry / Edge devices
A + C + DHidden
Wide, slow signals / Satellite uplinks / Edge devices
B + C + DHidden
Fast telemetry / Satellite uplinks / Edge devices
A + B + C + DHidden
Wide, slow signals / Fast telemetry / Satellite uplinks / Edge devices
Event log
  • Operator budget initialized at 60 credits.

World briefings

Eval alpha: Baseline mixture - same distribution as training.

Eval beta: Sensor drift stretches X and introduces mild label noise.

Eval gamma: Northern skies: mostly C & D with a +Y shift.

Eval delta: Adversarial sweep bending trajectories and flipping more labels.