Home Technology An AI board that pre-registers its bets – bet #1 just...
Technology

An AI board that pre-registers its bets – bet #1 just graded wrong

Key Points

A board of expert personas whose every decision is a pre-registered, time-anchored, reality-graded bet. Not a chatbot that agrees with you -- a board that keeps score, before the fact. https://danilushin.github.io/asktheboard/ Mechanism on sample data — the 60-second, no-key walkthrough below reproduces it exactly.

A board of expert personas whose every decision is a pre-registered, time-anchored, reality-graded bet. Not a chatbot that agrees with you -- a board that keeps score, before the fact. Landing page & docs: https://danilushin.github.io/asktheboard/ Mechanism on sample data — the 60-second, no-key walkthrough below reproduces it exactly. pip install asktheboard Anyone can clone a "panel of AI personas" in a weekend, and a dozen have. The debate mechanic is a commodity. What it leaves out is the thing that makes advice worth trusting: a record of having been right before the outcome was knowable. That record is hard to fake -- you can buy model outputs, but you can't back-date a timestamp. It only accrues the slow way: by calling decisions in advance and letting reality grade them, one resolution date at a time. So ask-the-board records, for every decision: - your stated prior (what you believed going in), - the per-seat dissent vector -- each seat's stance + its own probability, - a dated, falsifiable prediction, anchored before the outcome is knowable, - on the resolution date, reality's realized outcome, auto-reconciled into a Brier/calibration score per seat. The board-minute is a git-committable ADR. Your git history is the external attestation of the anchor timestamp. The accumulating, reality-graded record is the durable asset. create -> resolve -> score is pure data -- no LLM, no key, no network. This is a worked example on sample data: you supply the outcome with resolve , and the engine computes each seat's Brier score (lower is better). It shows the mechanism, not a track record -- the integrity comes from the anchor timestamp your git history attests, which no demo can fabricate. The committed artifacts live in examples/ . # pip-installed (no repo)? paste the sample spec below. Cloned the repo? # skip the heredoc and use --spec tests/sample_minute.json instead. cat > sample_minute.json <<'JSON' { "id": "2026-01-postgres-vs-vectordb", "question": "Adopt Postgres + pgvector, or a dedicated vector DB?", "prior": "Leaning toward a dedicated vector DB for the embeddings workload.", "decision": "Stay on Postgres + pgvector for now.", "prediction": { "statement": "We will NOT migrate off Postgres for vectors within 3 months.", "resolution_date": "2026-04-01", "board_probability": 0.75 }, "seats": [ {"seat": "pragmatist", "stance": "affirm", "probability": 0.8, "rationale": "Boring tech; pgvector is enough at this scale."}, {"seat": "skeptic", "stance": "dissent", "probability": 0.35, "rationale": "Recall/latency will bite once the corpus 10x's."} ], "created_at": "2026-01-05T10:30:00" } JSON asktheboard create --spec sample_minute.json asktheboard resolve --id 2026-01-postgres-vs-vectordb --outcome true asktheboard score seat n mean_brier wins losses ---------------------------------------------- pragmatist 1 0.040 0 0 skeptic 1 0.423 0 1 Full walkthrough + committed artifacts: examples/README.md . And a real one, still open: this repo pre-registered a board-minute about its own launch -- examples/open-minute.md , anchored in git on 2026-06-26, resolving 2026-09-24. No score yet; that's the point. The board may turn out wrong, and the anchor means it can't pretend otherwise. Live bet #1 (resolves in days): the board's call on the June 2026 US jobs report -- examples/2026-06-jobs-report.md , anchored 2026-06-27, resolving 2026-07-02 against the BLS Employment Situation release. The board says +150k or more at 56%; the skeptic dissents at 40%. Bet #1 of a public, recurring cadence -- come back on the date and watch it grade against a source nobody controls. The engine ships no provider and makes no calls of its own. You supply your own LLM key; you pay your own inference. The open-source core therefore costs nothing to run at any scale -- the cost lives with the user, not a host. (A managed, capped hosted tier -- for people who would rather not manage keys -- is the separate, paid product.) The OSS engine is free forever and runs on your own key. If you'd rather not manage keys -- or you want the aged, reality-graded public scoreboard hosted for you -- a managed, capped paid tier is coming. Want early access? Email [email protected] with the subject waitlist (a one-liner on what you'd decide with it helps, but isn't required). No spam -- one note when it opens. - A prediction cannot be pre-registered to resolve in the past (no backfilling an "old" call onto a known outcome). - A minute cannot be graded before its resolution date -- the outcome must not be knowable yet. That is what makes it foresight. - The anchor timestamp and the prediction are frozen once created; grading never moves them. See tests/test_model.py -- these are the load-bearing tests. pip install asktheboard # pre-register a decision (board-minute spec is JSON -- see "See it keep score" above) asktheboard create --spec sample_minute.json # ... months later, on/after the resolution date, grade it against reality asktheboard resolve --id 2026-01-postgres-vs-vectordb --outcome false # per-seat calibration scoreboard, best-calibrated first asktheboard score create writes both .json (the record) and .md (the committable ADR) into board-minutes/ . create pre-registers a minute you wrote by hand. convene runs the live LLM fan-out: every seat answers through your key, and the board's consensus probability is the mean of the seats' calls. It ships no provider -- bring an OpenAI-compatible endpoint (HTTPLLMClient is stdlib-only, zero dependencies). from asktheboard import convene, Seat, HTTPLLMClient minute = convene( id="pgvector-scale", question="Will pgvector hold our scale, or do we need a dedicated vector DB?", prior="leaning postgres to avoid a new service", decision="stay on postgres + pgvector", statement="pgvector serves p95<150ms at 50M embeddings without a dedicated DB", seats=[Seat("pragmatist", "ML researcher"), Seat("skeptic", "find the failure mode")], client=HTTPLLMClient(model="gpt-4o-mini"), # reads OPENAI_API_KEY decision_type="library", # -> 90-day resolution horizon ) Or from the CLI (key in OPENAI_API_KEY ): asktheboard convene --spec convene.json --model gpt-4o-mini Any OpenAI-compatible API works -- point --base-url (or HTTPLLMClient(base_url=...) ) at OpenRouter, Together, or a local server. The engine still makes no calls of its own; it only ever speaks through the client you pass. You can always hand-write Seat(name, persona) . But a sensible default board ships in the box: a curated set of role archetypes (the architect, the skeptic, the operator -- functions, not impersonations of real people) and a few named panels, so seating one is a single lookup. from asktheboard import convene, panel, seats, HTTPLLMClient minute = convene( id="pgvector-scale", question="Will pgvector hold our scale, or do we need a dedicated vector DB?", prior="leaning postgres", decision="stay on postgres + pgvector", statement="pgvector serves p95<150ms at 50M embeddings without a dedicated DB", seats=panel("tech"), # architect + skeptic + pragmatist # seats=seats(["architect", "operator", "skeptic"]), # or pick your own client=HTTPLLMClient(model="gpt-4o-mini"), decision_type="library", ) From the CLI, pass --panel or --seats instead of putting seats in the spec: asktheboard roster # list seats + panels asktheboard convene --spec d.json --model gpt-4o-mini --panel tech asktheboard convene --spec d.json --model gpt-4o-mini --seats architect,skeptic | seat | voice | |---|---| architect | shape, maintenance cost, what breaks at scale, build-vs-buy | skeptic | forced dissent -- the most likely failure first, then the deeper objection | pragmatist | simplest thing that ships; YAGNI; opportunity cost | researcher | what the data actually says; base rate before anecdote | operator | run cost, failure budget, who gets paged at 3am | strategist | base rates, second-order effects, one-way vs reversible doors | Panels: tech (architect/skeptic/pragmatist), decision (strategist/skeptic/researcher), ops (operator/architect/skeptic), default (architect/skeptic/pragmatist/strategist). skeptic sits on every panel by design -- a board with no dissent keeps no honest score. A minute is only foresight if it has a date by which reality can grade it. decision_type picks a sensible default horizon so the common case is one lookup (and a 5-year horizon on a library swap stands out as dishonest): | type | horizon | when | |---|---|---| library | 90d | adopt/swap/drop a dependency | migration | 180d | move a datastore, platform, or pipeline | architecture | 365d | a structural design bet you live with | Short-latency first on purpose: a fresh board earns a track record on fast library calls before anyone trusts its slow architecture bets. Pass an explicit resolution_date= to override. When a seat dissents from the board and turns out more right than the consensus, that is a contrarian win -- the gold the public scoreboard is built from. The board changed (or should have changed) its mind, and reality later stamped the dissenter vindicated. What's shipped: the foresight engine (data model + grading + committable ADR) and the BYOK LLM fan-out that produces a board-minute (asktheboard.convene , behind the asktheboard.llm Protocol). No provider is bundled -- you plug in your own key. The public API is 0.x / unstable. The LLMClient / HTTPLLMClient surface and the board-minute JSON schema may change before 1.0 -- pin a version if you depend on them. Built by Dan Ilushin with Claude (Anthropic) in the loop. Contributions welcome -- see CONTRIBUTING.md (DCO sign-off) and SECURITY.md. MIT. (c) 2026 Dan Ilushin.
AI (ORG) Brier (LOCATION) ADR (ORG) LLM (ORG) HTTPLLMClient (ORG) CLI (ORG) API (ORG) OpenRouter (ORG)
Originally published by Hacker News Read original →