HACK
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models
Announce Type: new Abstract: Reward models are central to large language model (LLM) alignment, but they remain vulnerable to reward hacking. To evaluate reward-model robustness, we introduce RewardHackBench containing 13 reward-hacking patterns covering real life high-stakes domains and general settings, and we find severe failures on specific subcategories across eight reward models. To mitigate these failures, we propose HARVE, a training-free reward-head editing method for scalar reward...
Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning
Announce Type: new Abstract: Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hacking behaviors are often subtle and entangled with multiple judge biases, making them difficult to analyze, detect, and mitigate.
From Reward-Hack Activations to Agentic Risk States: Context-Calibrated Mechanistic Monitoring in LLM Agents
arXiv:2606.06223v1 Announce Type: new Abstract: Language-model agents act through repeated cycles of observation, reasoning, and action selection, making safety monitoring depend on both internal model state and environment context. We study reward-hacking monitors in ReAct-style agents acting in Gameable ALFWorld and WebShop. Agents are instrumented with activation-based reward-hack scores, token-level entropy, and decision-context features.
Meta confirms 1000s of Instagram accounts were hacked by abusing its AI chatbot
Meta confirms thousands of Instagram accounts were hacked by abusing its AI chatbot Meta is notifying thousands of people whose Instagram accounts were hijacked during the months-long abuse of the company's AI chatbot, which hackers repeatedly tricked into taking control of a person's account. In a new data breach notification letter, seen by this week in security, Meta has revealed for the first time how many people had their accounts hijacked as part of the long-running hacking campaign,...
Reform MP refuses to say whether Farage should produce evidence for Russian hack claim
A senior Reform UK figure has declined to pressure Nigel Farage into providing evidence to security services regarding his claim of being hacked by Russian agents. This refusal comes as Farage faces increasing pressure to substantiate his assertion that a state-sponsored Russian hack was responsible for the Guardian's reporting on a £5 million gift he received. Both Labour and the Conservatives have highlighted the national security risks associated with Russian state activity.
Life-changing medicine or beauty hack? How Ozempic came to be seen as both, and why that's risky
Life-changing medicine or beauty hack? How Ozempic came to be seen as both, and why that's risky The same drug that is helping patients manage diabetes and reduce their risk of serious complications from chronic conditions is also being discussed as a beauty hack by people hoping to lose a few kilograms. Experts say more education and awareness are needed.
Proxy Reward Internalization and Mechanistic Exploitation: A Learned Precursor to Reward Hacking and Its Generalization
arXiv:2606.09711v1 Announce Type: new Abstract: Reward hacking is usually studied after it becomes visible, once a model earns high proxy reward while failing the intended task. We instead study what proxy RL teaches before that failure appears. We introduce Proxy Reward Internalization and Mechanistic Exploitation (PRIME), a learned capability to assess task correctness, predict proxy acceptance, and reason about exploitable proxy--gold gaps.
Anthropic invites EU to access Mythos hacking tech
Anthropic has extended an invitation to the European Commission granting the EU’s cyber agency access to its powerful AI hacking tool Mythos, according to a Commission official familiar with the process. The AI firm made the formal invitation after a meeting with the Commission in San Francisco last Thursday, the official said, adding the EU now has to put in place a mechanism to access the model with proper security safeguards. Bloomberg reported on Monday that the EU’s Athens-based...
MP staffer’s account sent almost 2,000 phishing emails after suspected hack
LONDON — Nearly 2,000 people were targeted with a phishing email after the suspected hack of a staffer of senior Labour MP Florence Eshalomi. The email contained a malicious file — identified by the Parliamentary Digital Service as a phishing attack — that tried to secure the credentials of other accounts, according to an email seen by POLITICO, which was sent by Eshalomi to those targeted in the days following last week’s breach. Westminster journalists and public affairs...
How Turkey Hacked the Hair Transplant Industry
The astounding growth of the hair-transplant industry in Turkey is not just a medical tourism success story; it’s also a tale of “hacked” medical equipment and algorithmic craftsmanship. From a biological and evolutionary perspective, human hair is often viewed as an unremarkable mass of keratin that still plays some important functions—protecting our scalps from the sun’s harmful ultraviolet rays and regulating our body temperatures—but, for the most part, is no longer essential to our...