Home Knowledge Base Llama-Guard

Llama-Guard

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Announce Type: replace Abstract: Computer-use agents extend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift creates safety risks that are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step execution traces whose individual actions appear locally benign. We introduce BraveGuard, a self-evolving defense framework for training guard models from open-world...

arXiv CS 7d ago

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

arXiv:2606.01166v1 Announce Type: new Abstract: Computer-use agents extend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift creates safety risks that are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step execution traces whose individual actions appear locally benign. We introduce BraveGuard, a self-evolving defense framework for training guard models...

arXiv CS 8d ago