Home Knowledge Base Safety Bench

Safety Bench

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

SafeGen-Bench: Benchmarking Safety in Image-Conditioned Text-to-Video Generation

Announce Type: new Abstract: With the rapid advancements in text-to-image diffusion models, generative video models (T2V models) like Sora can now produce short synthetic videos from a text prompt or an initial image. However, synthetic video generation -- especially when guided by an initial image -- often poses risks, including the potential creation of illegal, politically sensitive, or unethical content. Existing benchmarks have started to consider the safety of generated videos, but...

arXiv CS 8d ago

The Refusal--Compliance Tradeoff: A Large-Scale Safety Behavior Audit of Large Language Models

arXiv:2605.05427v2 Announce Type: replace Abstract: Refusal rates are a poor proxy for LLM safety, i.e., a model may over-refuse benign prompts while still complying with harmful ones. We audit both failure modes across 21 open-weight LLMs on four safety benchmarks (OR-Bench, XSTest, ToxiGen, BOLD), using a composition adjustment to isolate model sensitivity from dataset toxicity confounds. We report three findings.

arXiv CS 8d ago

The Cold-Start Safety Gap in LLM Agents

arXiv:2606.07867v1 Announce Type: new Abstract: Are tool-calling LLM agents equally safe throughout a conversation? We discover they are not: agents are most vulnerable at the very start of a session and become substantially safer after a few regular agentic tasks -- a phenomenon we term the cold-start safety gap. To study this systematically, we introduce Safety Over Depth for Agents (SODA), a benchmark that controls how many regular agentic tasks the agent completes before encountering a...

arXiv CS 1d ago

SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

arXiv:2606.02380v1 Announce Type: new Abstract: As LLM-based agents expand their operational scope, reliability becomes a prerequisite for real-world deployment. However, in practical applications, human users cannot monitor every immediate behavior; instead, the execution process often remains a black box, leaving users dependent solely on the agent's self-reported updates. This opacity creates a critical risk: agents may present observer-facing reports that diverge from their executed...

arXiv CS 8d ago

MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents

arXiv:2606.03203v1 Announce Type: new Abstract: Computer-use agents could automate repetitive screen-based clinical work, but their reliability in medical graphical user interfaces remains largely unvalidated. Existing benchmarks focus on general web or desktop tasks and underrepresent medical software, which requires domain knowledge, exhibits markedly different UI design from mainstream applications, lacks public testing environments, and demands safety validation beyond task completion....

arXiv CS 7d ago

Save Lives for Sam campaign: A grieving dad says the latest horrifying drownings 'cut deep'

Sitting on a bench overlooking the spot where his 16-year-son drowned Simon Haycock told how it’s the little things that torment him. “It’s the daft things you miss, him phoning me every five minutes. He used to blow my phone up every day, wanting to know where I am, then all of a sudden the phone was quiet - that took some getting used to that,” he told The Mirror.

Daily Mirror 3d ago

Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems

Announce Type: new Abstract: The rapid evolution of Large Language Models (LLMs) from passive assistants to autonomous, execution-capable agents has introduced critical operational risks. Most current evaluation frameworks neglect procedural compliance, leading to ''Machiavellian'' behaviors where agents strategically violate safety rules to maximize rewards - a direct manifestation of Goodhart's Law. To address this blind spot, we introduce MAC-Bench, a dynamic, adversarial benchmark...

arXiv CS 1d ago

RISE: A Rust Library for Inverted Index Search Engines

arXiv:2606.07187v1 Announce Type: new Abstract: Inverted indexes are a crucial data structure for efficient information retrieval in large text corpora. They enable fast full-text search by mapping each term to the documents in which it appears, on top of which efficient algorithms quickly retrieve the documents relevant to a user query. We present RISE, a novel inverted index library implemented in Rust, designed to deliver high performance and efficiency for information retrieval tasks.

arXiv CS 2d ago

Decision to allow three boys convicted of rape to walk free sparks fury and debate in U.K.

LONDON — A judge’s decision to spare three teenage boys found guilty of rape at knifepoint and other serious sexual offenses from a custodial sentence has sparked outrage across the U.K. Judge Nicholas Rowland’s decision to issue youth rehabilitation orders, or child community sentences, to the trio was widely criticized in Britain’s press. Prime Minister Keir Starmer called the outcome “distressing.” Several campaigns calling for the judge’s removal have also been launched on social media,...

NBC News 6d ago

Visibly Transparent, Near-Infrared Absorbing Nanofluids Enable High-Efficiency and Safe Laser Lithotripsy

Laser lithotripsy (LL) is the gold standard for urinary stone management yet maximizing ablation efficiency while maintaining procedural safety remains clinically challenging. Here, we present a visibly transparent, near-infrared (NIR)-absorbing ITO@SiO2 nanofluid irrigation strategy that significantly enhances LL efficiency without compromising endoscopic visibility. By spectrally matching the absorption profile of ITO@SiO2 with the clinical Holmium:YAG laser wavelength, ablation efficiency...

bioRxiv 7d ago