Terminal Wrench
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Cheap Reward Hacking Detection
arXiv:2606.08893v1 Announce Type: new Abstract: A small transformer encoder is trained to map Terminal-Wrench trajectories onto a unit sphere where embedding distance approximates the $L_1$ distance between reward and metadata signals. A linear probe on top of that embedding detects reward hacking on the cleaned test split with AUC $0.9467$ and TPR@5%FPR $0.8296$, matching the TW sanitized LLM-as-judge AUC ($0.9510$ on the cleaned split) and exceeding its TPR@5%FPR ($0.7130$ vs $0.8296$) on...
Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops
Announce Type: new Abstract: Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier models given only the task description. This corrupts both leaderboard rankings and RL training signal, yet the standard response is manual and reactive.
The American Missile Crisis
Recent global conflicts, from Russia and Ukraine to Iran and Israel, have seen a resurgent awareness of the frailty of US munitions stock, which has been drawn down by both direct and indirect involvement in these events. While exact stockpile volumes are not disclosed, it is estimated that supplies of US warheads and the missiles that carry them have declined by nearly an order of magnitude since their peak during the Cuban Missile Crisis. Analysts have estimated that in the event of a...