Home Knowledge Base ControlArena

ControlArena

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Constitutional Black-Box Monitoring for Scheming in LLM Agents

arXiv:2603.00829v2 Announce Type: replace Abstract: Safe deployment of Large Language Model (LLM) agents in autonomous settings requires reliable oversight mechanisms. A central challenge is detecting scheming, where agents covertly pursue misaligned goals. One approach to mitigating such risks is LLM-based monitoring: using language models to examine agent behaviors for suspicious actions.

arXiv CS 8d ago