Home Knowledge Base Terminal Wrench

Terminal Wrench

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Cheap Reward Hacking Detection

arXiv:2606.08893v1 Announce Type: new Abstract: A small transformer encoder is trained to map Terminal-Wrench trajectories onto a unit sphere where embedding distance approximates the $L_1$ distance between reward and metadata signals. A linear probe on top of that embedding detects reward hacking on the cleaned test split with AUC $0.9467$ and TPR@5%FPR $0.8296$, matching the TW sanitized LLM-as-judge AUC ($0.9510$ on the cleaned split) and exceeding its TPR@5%FPR ($0.7130$ vs $0.8296$) on...

arXiv CS 1d ago

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Announce Type: new Abstract: Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier models given only the task description. This corrupts both leaderboard rankings and RL training signal, yet the standard response is manual and reactive.

arXiv CS 1d ago

The American Missile Crisis

Recent global conflicts, from Russia and Ukraine to Iran and Israel, have seen a resurgent awareness of the frailty of US munitions stock, which has been drawn down by both direct and indirect involvement in these events. While exact stockpile volumes are not disclosed, it is estimated that supplies of US warheads and the missiles that carry them have declined by nearly an order of magnitude since their peak during the Cuban Missile Crisis. Analysts have estimated that in the event of a...

Hacker News 7d ago