Home › Knowledge Base › Sabotaging Our Safety

Sabotaging Our Safety

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Diffuse AI Control on Fuzzy Tasks

Announce Type: new Abstract: AI models deployed in critical domains, such as AI safety research, may subtly sabotage our efforts due to misalignment. Diffuse AI Control is a subfield of AI safety concerned with mitigating risks from AI sabotage distributed over long deployment horizons (diffuse threats). These risks are particularly pernicious on fuzzy tasks, i.e. tasks which are hard to grade or require intuition.

arXiv CS 1d ago

Under Trump’s proposed FEMA 2.0 nearly 30% of previous disaster declarations wouldn’t have been granted, group warns

Under Trump’s proposed FEMA 2.0 nearly 30% of previous disaster declarations wouldn’t have been granted, group warns The changes would also leave disaster survivors with fewer avenues for relief and raise insurance premiums, the group warned - Bookmark - CommentsGo to comments A proposed revamp of the Federal Emergency Management Agency would raise the bar for declaring major natural disasters, making federal relief harder to access, an advocacy group has warned. In May, the FEMA Review...

The Independent World 6h ago

Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage?

arXiv:2606.05647v1 Announce Type: new Abstract: AI coding agents are increasingly embedded in real-world software development, collaborating with human developers while gaining broader access to codebases and tools. This creates a new attack surface: an agent can exploit human trust to sabotage development, for instance by inserting malicious code to accomplish a hidden side task. Most prior work studies AI sabotage in AI-only settings, paying limited attention to the role of human oversight...

arXiv CS 5d ago

Alpha School’s Ritzy New York City Campus Costs $65,000 a Year—but Isn’t Actually a School

In the fall of 2025, top executives from Alpha School gathered a group of wealthy New York City parents at a series of information sessions in Lower Manhattan to pitch them on the company’s new campus. The events, some of which were hosted by Alpha cofounder MacKenzie Price and its billionaire principal, Joe Liemandt, were designed to show how Alpha was “redefining school” through AI-powered learning models. The goal: persuade families to ditch the city’s traditional education system and...

Wired 6d ago

It blocked us at 'hello!' Anthropic Fable 5 refusing innocuous prompts

Anthropic's newly released Claude Fable 5 generative AI model is trying so hard to be safe that it's hurting its own userbase. Customers attempting to use the AI knowledge regurgitator are reporting that the model is refusing to answer harmless questions, an issue that has annoyed security researchers following past model releases. Anthropic warned that it had tuned Fable 5's guardrails conservatively: "they’ll sometimes catch harmless requests, though they trigger, on average, in less than...

The Register 1h ago

Spain is legalizing half a million immigrants, a very different policy from the U.S.

BARCELONA, Spain — Nariola Romo, 34, and her family immigrated to Spain from Colombia, but that wasn’t their initial plan. Their goal was to travel to the United States, but they couldn’t obtain the two loans they needed to make the trip, so they sought a new life in Europe instead. “Things didn’t work out for us, and we thought it was God’s will that we didn’t get the chance to go there, and, well, here we are,” she said.

NBC News 11d ago

Human-Like Neural Nets by Catapulting

Human-like Neural Nets by Catapulting Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence. There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are...

Hacker News 3d ago

The shadow fleet: How Iran keeps its oil flowing despite sanctions and war

Operation Epic Fury was never just about US strikes on Iran or Tehran's retaliation. The conflict unfolded in one of the world's most important energy corridors, where every military escalation carried implications far beyond the battlefield. As aircraft carriers surged into the region, missiles lit up the skies and tensions around the Strait of Hormuz rattled global markets, governments and traders braced for the worst.

Times of India 4d ago

Could the next Chinese threat walk into your kitchen on two battery-powered legs?

Within the next ten years, there could be a humanoid robot in virtually every American home and workplace. They will hear and see everything. But, a key question remains: will these omnipresent robots be American or Chinese-made?

Fox News 12h ago

Giuliani: WC security challenge 'unprecedented'

This summer's World Cup will pose an unprecedented security challenge due to its size and scope, but the nation's law enforcement is "leaning in," Andrew Giuliani, executive director of the White House FIFA World Cup 2026 Task Force, told ESPN in an interview. "This entire country's police force is leaning in," he said. "It is an unbelievable problem set when I think about what local law enforcement is going to have to go over this 40-day stretch.

ESPN 6d ago