Home Knowledge Base Safety Gate

Safety Gate

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

The Governance of Human-LLM Interaction: Safety Gating, Civility Steering, and Affective Default Lock-In

Announce Type: new Abstract: Large language models (LLMs) increasingly mediate high-stakes interactions in finance, medicine, and mental-health support, yet users have limited control over how these systems communicate. We frame interaction style as a governance object: provider-side alignment not only blocks harmful content, but also stabilizes communicative defaults that shape users' epistemic distance, relational expectations, and capacity to opt out of emotionalized or anthropomorphic...

arXiv CS 1d ago

A Method for Neutron-Gamma Pulse Shape Discrimination of CLYC Detector Based on a Gated Residual-Linear Attention Network

arXiv:2606.02613v1 Announce Type: new Abstract: The discrimination of neutron and gamma pulse shapes is a key technology in fields such as nuclear safety monitoring and radiation assessment. An enhanced recursive gated cyclic residual-sparse linear attention network is developed on the CLYC detector experimental platform to overcome weak noise resistance, limited feature extraction and inferior real-time performance of conventional algorithms. The experimental dataset comprises 19,971...

arXiv Physics 7d ago

Oversight Has a Capacity: Calibrating Agent Guards to a Subjective, Fatiguing Human

Announce Type: new Abstract: As LLM agents begin to take real, irreversible actions (shell commands, file edits, deploys), the standard safety pattern is a human-in-the-loop approval gate: risky actions pause and wait for a person. We argue the gate is the easy part; the hard part is the judgment - which actions to stop - which the field evaluates against two false assumptions: that there is a ground-truth notion of "risky," and that the human reviewer is a perfect, infinitely-available...

arXiv CS 1d ago

No way out: Sealed windows, sensor gates turned Delhi hotel into death trap during fire

Sealed glass windows, sensor-operated gates that reportedly failed during the blaze, overcrowded rooms, and multiple fire safety violations turned a hotel in Malviya Nagar into a death trap on Wednesday. The fire left 21 people dead and several others trapped as smoke and flames quickly engulfed the building, severely restricting occupants' chances of escape. Preliminary findings by the Delhi Fire Services suggest that the devastating fire may have originated near the staircase on the ground...

Times of India 7d ago

Apple rolls out new, AI-powered Siri at annual WWDC

Apple rolls out new, AI-powered Siri at annual WWDC CUPERTINO, California, June 8 : Apple on Monday unveiled a new, AI-powered version of Siri that is capable of analyzing what is on the device screen and reaching out to the web for more information, rolling out a long-awaited overhaul of its popular voice assistant. Called "Siri AI," the software will also have its own dedicated app, Apple said at its annual Worldwide Developers Conference at its Cupertino, California, headquarters. Siri AI...

Channel News Asia 2d ago

Agentic Relationship Harm: Benchmarking and Gating Relational Manipulation in AI Agents

Announce Type: new Abstract: AI agents built on large language models can assist not only legitimate tasks but also relational manipulation. AI agents can be used to help a user maintain a deceptive identity, intensify emotional dependency, isolate a target, or prepare for later extraction. We conceptualise this risk as agentic relationship harm: workflow-level assistance that can exploit recipient vulnerability, persuasive influence, and relational power asymmetry.

arXiv CS 7d ago

NormEval: A Unified Multi-Metric Framework for Evaluating Semantic Fidelity in Text Normalization

arXiv:2511.20409v2 Announce Type: replace Abstract: Text normalization methods such as stemming and lemmatization are fundamental components of NLP pipelines. As new normalization tools are developed for diverse languages, evaluation methodologies remain fragmented, relying on Compression Ratio, downstream accuracy, or sequence-to-sequence prediction scores in isolation, failing to distinguish between beneficial vocabulary reduction and harmful semantic distortion.

arXiv CS 8d ago

A Pathology Foundation Model for Gastric Cancer with Real-World Validation

arXiv:2606.04792v1 Announce Type: new Abstract: Gastric cancer remains a major cause of cancer mortality, yet its histological and molecular heterogeneity complicates diagnosis and risk stratification. General-purpose pathology foundation models (PFMs) often plateau on fine-grained endpoints central to gastric cancer care, and few have undergone rigorous prospective validation or clinical reader studies. We present GRACE, a Gastric-specific foundation model for Real-world Assessment and...

arXiv CS 6d ago

When Should the Teacher Move? Temporal Coupling and Stability in Self On-Policy Distillation

Announce Type: new Abstract: Self on-policy distillation trains a student policy against a teacher derived from its own parameter history, yet the teacher's update schedule -- which governs the \emph{temporal coupling} between teacher and student -- has not been systematically studied as a stability variable. Through a controlled schedule sweep on Qwen3-8B, we establish that \emph{isolation periods}, defined as complete teacher freezing between updates, are the key structural property...

arXiv CS 7d ago

Urgent recalls issued for Waitrose snack, popular dessert and baby item — 'do not use'

Urgent recalls issued for Waitrose snack, popular dessert and baby item — 'do not use' A number of popular food items and products have been recalled this week Customers have been urged to check if they have affected items as a number of food and product recalls were issued this week. Popular snacks and desserts have been recalled by the Food Standards Agency (FSA).

Daily Mirror 4d ago