Technology
An AI board that pre-registers its bets – bet #1 just graded wrong
Key Points
A board of expert personas whose every decision is a pre-registered, time-anchored, reality-graded bet. Not a chatbot that agrees with you -- a board that keeps score, before the fact. https://danilushin.github.io/asktheboard/ Mechanism on sample data — the 60-second, no-key walkthrough below reproduces it exactly.
A board of expert personas whose every decision is a pre-registered, time-anchored, reality-graded bet. Not a chatbot that agrees with you -- a board that keeps score, before the fact.
Landing page & docs: https://danilushin.github.io/asktheboard/
Mechanism on sample data — the 60-second, no-key walkthrough below reproduces it exactly.
pip install asktheboard
Anyone can clone a "panel of AI personas" in a weekend, and a dozen have. The debate mechanic is a commodity. What it leaves out is the thing that makes advice worth trusting: a record of having been right before the outcome was knowable. That record is hard to fake -- you can buy model outputs, but you can't back-date a timestamp. It only accrues the slow way: by calling decisions in advance and letting reality grade them, one resolution date at a time.
So ask-the-board records, for every decision:
- your stated prior (what you believed going in),
- the per-seat dissent vector -- each seat's stance + its own probability,
- a dated, falsifiable prediction, anchored before the outcome is knowable,
- on the resolution date, reality's realized outcome, auto-reconciled into a Brier/calibration score per seat.
The board-minute is a git-committable ADR. Your git history is the external attestation of the anchor timestamp. The accumulating, reality-graded record is the durable asset.
create -> resolve -> score
is pure data -- no LLM, no key, no network. This is a
worked example on sample data: you supply the outcome with resolve
, and the
engine computes each seat's Brier score (lower is better). It shows the mechanism,
not a track record -- the integrity comes from the anchor timestamp your git history
attests, which no demo can fabricate. The committed artifacts live in
examples/
.
# pip-installed (no repo)? paste the sample spec below. Cloned the repo?
# skip the heredoc and use --spec tests/sample_minute.json instead.
cat > sample_minute.json <<'JSON'
{
"id": "2026-01-postgres-vs-vectordb",
"question": "Adopt Postgres + pgvector, or a dedicated vector DB?",
"prior": "Leaning toward a dedicated vector DB for the embeddings workload.",
"decision": "Stay on Postgres + pgvector for now.",
"prediction": {
"statement": "We will NOT migrate off Postgres for vectors within 3 months.",
"resolution_date": "2026-04-01",
"board_probability": 0.75
},
"seats": [
{"seat": "pragmatist", "stance": "affirm", "probability": 0.8, "rationale": "Boring tech; pgvector is enough at this scale."},
{"seat": "skeptic", "stance": "dissent", "probability": 0.35, "rationale": "Recall/latency will bite once the corpus 10x's."}
],
"created_at": "2026-01-05T10:30:00"
}
JSON
asktheboard create --spec sample_minute.json
asktheboard resolve --id 2026-01-postgres-vs-vectordb --outcome true
asktheboard score
seat n mean_brier wins losses
----------------------------------------------
pragmatist 1 0.040 0 0
skeptic 1 0.423 0 1
Full walkthrough + committed artifacts: examples/README.md
.
And a real one, still open: this repo pre-registered a board-minute about its own
launch -- examples/open-minute.md
, anchored in git on
2026-06-26, resolving 2026-09-24. No score yet; that's the point. The board may turn
out wrong, and the anchor means it can't pretend otherwise.
Live bet #1 (resolves in days): the board's call on the June 2026 US jobs report
-- examples/2026-06-jobs-report.md
,
anchored 2026-06-27, resolving 2026-07-02 against the BLS Employment Situation
release. The board says +150k or more at 56%; the skeptic dissents at 40%. Bet #1 of
a public, recurring cadence -- come back on the date and watch it grade against a source
nobody controls.
The engine ships no provider and makes no calls of its own. You supply your own LLM key; you pay your own inference. The open-source core therefore costs nothing to run at any scale -- the cost lives with the user, not a host. (A managed, capped hosted tier -- for people who would rather not manage keys -- is the separate, paid product.)
The OSS engine is free forever and runs on your own key. If you'd rather not manage keys -- or you want the aged, reality-graded public scoreboard hosted for you -- a managed, capped paid tier is coming.
Want early access? Email [email protected] with the subject waitlist
(a one-liner on what you'd decide with it helps, but isn't required). No spam --
one note when it opens.
- A prediction cannot be pre-registered to resolve in the past (no backfilling an "old" call onto a known outcome).
- A minute cannot be graded before its resolution date -- the outcome must not be knowable yet. That is what makes it foresight.
- The anchor timestamp and the prediction are frozen once created; grading never moves them.
See tests/test_model.py
-- these are the load-bearing tests.
pip install asktheboard
# pre-register a decision (board-minute spec is JSON -- see "See it keep score" above)
asktheboard create --spec sample_minute.json
# ... months later, on/after the resolution date, grade it against reality
asktheboard resolve --id 2026-01-postgres-vs-vectordb --outcome false
# per-seat calibration scoreboard, best-calibrated first
asktheboard score
create
writes both .json
(the record) and .md
(the committable ADR)
into board-minutes/
.
create
pre-registers a minute you wrote by hand. convene
runs the live LLM
fan-out: every seat answers through your key, and the board's consensus
probability is the mean of the seats' calls. It ships no provider -- bring an
OpenAI-compatible endpoint (HTTPLLMClient
is stdlib-only, zero dependencies).
from asktheboard import convene, Seat, HTTPLLMClient
minute = convene(
id="pgvector-scale",
question="Will pgvector hold our scale, or do we need a dedicated vector DB?",
prior="leaning postgres to avoid a new service",
decision="stay on postgres + pgvector",
statement="pgvector serves p95<150ms at 50M embeddings without a dedicated DB",
seats=[Seat("pragmatist", "ML researcher"), Seat("skeptic", "find the failure mode")],
client=HTTPLLMClient(model="gpt-4o-mini"), # reads OPENAI_API_KEY
decision_type="library", # -> 90-day resolution horizon
)
Or from the CLI (key in OPENAI_API_KEY
):
asktheboard convene --spec convene.json --model gpt-4o-mini
Any OpenAI-compatible API works -- point --base-url
(or HTTPLLMClient(base_url=...)
)
at OpenRouter, Together, or a local server. The engine still makes no calls of its
own; it only ever speaks through the client you pass.
You can always hand-write Seat(name, persona)
. But a sensible default board ships
in the box: a curated set of role archetypes (the architect, the skeptic, the
operator -- functions, not impersonations of real people) and a few named panels,
so seating one is a single lookup.
from asktheboard import convene, panel, seats, HTTPLLMClient
minute = convene(
id="pgvector-scale",
question="Will pgvector hold our scale, or do we need a dedicated vector DB?",
prior="leaning postgres",
decision="stay on postgres + pgvector",
statement="pgvector serves p95<150ms at 50M embeddings without a dedicated DB",
seats=panel("tech"), # architect + skeptic + pragmatist
# seats=seats(["architect", "operator", "skeptic"]), # or pick your own
client=HTTPLLMClient(model="gpt-4o-mini"),
decision_type="library",
)
From the CLI, pass --panel
or --seats
instead of putting seats in the spec:
asktheboard roster # list seats + panels
asktheboard convene --spec d.json --model gpt-4o-mini --panel tech
asktheboard convene --spec d.json --model gpt-4o-mini --seats architect,skeptic
| seat | voice |
|---|---|
architect |
shape, maintenance cost, what breaks at scale, build-vs-buy |
skeptic |
forced dissent -- the most likely failure first, then the deeper objection |
pragmatist |
simplest thing that ships; YAGNI; opportunity cost |
researcher |
what the data actually says; base rate before anecdote |
operator |
run cost, failure budget, who gets paged at 3am |
strategist |
base rates, second-order effects, one-way vs reversible doors |
Panels: tech
(architect/skeptic/pragmatist), decision
(strategist/skeptic/researcher),
ops
(operator/architect/skeptic), default
(architect/skeptic/pragmatist/strategist).
skeptic
sits on every panel by design -- a board with no dissent keeps no honest score.
A minute is only foresight if it has a date by which reality can grade it.
decision_type
picks a sensible default horizon so the common case is one lookup
(and a 5-year horizon on a library swap stands out as dishonest):
| type | horizon | when |
|---|---|---|
library |
90d | adopt/swap/drop a dependency |
migration |
180d | move a datastore, platform, or pipeline |
architecture |
365d | a structural design bet you live with |
Short-latency first on purpose: a fresh board earns a track record on fast library
calls before anyone trusts its slow architecture
bets. Pass an explicit
resolution_date=
to override.
When a seat dissents from the board and turns out more right than the consensus, that is a contrarian win -- the gold the public scoreboard is built from. The board changed (or should have changed) its mind, and reality later stamped the dissenter vindicated.
What's shipped: the foresight engine (data model + grading + committable
ADR) and the BYOK LLM fan-out that produces a board-minute
(asktheboard.convene
, behind the asktheboard.llm
Protocol). No provider is
bundled -- you plug in your own key.
The public API is 0.x
/ unstable. The LLMClient
/ HTTPLLMClient
surface
and the board-minute JSON schema may change before 1.0
-- pin a version if you
depend on them.
Built by Dan Ilushin with Claude (Anthropic) in the loop. Contributions welcome -- see CONTRIBUTING.md (DCO sign-off) and SECURITY.md.
MIT. (c) 2026 Dan Ilushin.