Safety Bench
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
SafeGen-Bench: Benchmarking Safety in Image-Conditioned Text-to-Video Generation
Announce Type: new Abstract: With the rapid advancements in text-to-image diffusion models, generative video models (T2V models) like Sora can now produce short synthetic videos from a text prompt or an initial image. However, synthetic video generation -- especially when guided by an initial image -- often poses risks, including the potential creation of illegal, politically sensitive, or unethical content. Existing benchmarks have started to consider the safety of generated videos, but...
The Refusal--Compliance Tradeoff: A Large-Scale Safety Behavior Audit of Large Language Models
arXiv:2605.05427v2 Announce Type: replace Abstract: Refusal rates are a poor proxy for LLM safety, i.e., a model may over-refuse benign prompts while still complying with harmful ones. We audit both failure modes across 21 open-weight LLMs on four safety benchmarks (OR-Bench, XSTest, ToxiGen, BOLD), using a composition adjustment to isolate model sensitivity from dataset toxicity confounds. We report three findings.
The Cold-Start Safety Gap in LLM Agents
arXiv:2606.07867v1 Announce Type: new Abstract: Are tool-calling LLM agents equally safe throughout a conversation? We discover they are not: agents are most vulnerable at the very start of a session and become substantially safer after a few regular agentic tasks -- a phenomenon we term the cold-start safety gap. To study this systematically, we introduce Safety Over Depth for Agents (SODA), a benchmark that controls how many regular agentic tasks the agent completes before encountering a...
SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence
arXiv:2606.02380v1 Announce Type: new Abstract: As LLM-based agents expand their operational scope, reliability becomes a prerequisite for real-world deployment. However, in practical applications, human users cannot monitor every immediate behavior; instead, the execution process often remains a black box, leaving users dependent solely on the agent's self-reported updates. This opacity creates a critical risk: agents may present observer-facing reports that diverge from their executed...
MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents
arXiv:2606.03203v1 Announce Type: new Abstract: Computer-use agents could automate repetitive screen-based clinical work, but their reliability in medical graphical user interfaces remains largely unvalidated. Existing benchmarks focus on general web or desktop tasks and underrepresent medical software, which requires domain knowledge, exhibits markedly different UI design from mainstream applications, lacks public testing environments, and demands safety validation beyond task completion....
Save Lives for Sam campaign: A grieving dad says the latest horrifying drownings 'cut deep'
Sitting on a bench overlooking the spot where his 16-year-son drowned Simon Haycock told how it’s the little things that torment him. “It’s the daft things you miss, him phoning me every five minutes. He used to blow my phone up every day, wanting to know where I am, then all of a sudden the phone was quiet - that took some getting used to that,” he told The Mirror.
Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems
Announce Type: new Abstract: The rapid evolution of Large Language Models (LLMs) from passive assistants to autonomous, execution-capable agents has introduced critical operational risks. Most current evaluation frameworks neglect procedural compliance, leading to ''Machiavellian'' behaviors where agents strategically violate safety rules to maximize rewards - a direct manifestation of Goodhart's Law. To address this blind spot, we introduce MAC-Bench, a dynamic, adversarial benchmark...
RISE: A Rust Library for Inverted Index Search Engines
arXiv:2606.07187v1 Announce Type: new Abstract: Inverted indexes are a crucial data structure for efficient information retrieval in large text corpora. They enable fast full-text search by mapping each term to the documents in which it appears, on top of which efficient algorithms quickly retrieve the documents relevant to a user query. We present RISE, a novel inverted index library implemented in Rust, designed to deliver high performance and efficiency for information retrieval tasks.
Decision to allow three boys convicted of rape to walk free sparks fury and debate in U.K.
LONDON — A judge’s decision to spare three teenage boys found guilty of rape at knifepoint and other serious sexual offenses from a custodial sentence has sparked outrage across the U.K. Judge Nicholas Rowland’s decision to issue youth rehabilitation orders, or child community sentences, to the trio was widely criticized in Britain’s press. Prime Minister Keir Starmer called the outcome “distressing.” Several campaigns calling for the judge’s removal have also been launched on social media,...
Visibly Transparent, Near-Infrared Absorbing Nanofluids Enable High-Efficiency and Safe Laser Lithotripsy
Laser lithotripsy (LL) is the gold standard for urinary stone management yet maximizing ablation efficiency while maintaining procedural safety remains clinically challenging. Here, we present a visibly transparent, near-infrared (NIR)-absorbing ITO@SiO2 nanofluid irrigation strategy that significantly enhances LL efficiency without compromising endoscopic visibility. By spectrally matching the absorption profile of ITO@SiO2 with the clinical Holmium:YAG laser wavelength, ablation efficiency...