Robust In-Context Reinforcement Learning Under Reward Poisoning Attacks

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Paulius Sasnauskas, Yi\u{g}it Yal{\i}n, Goran Radanovi\'c 1 min read

Key Points

arXiv:2506.06891v3 Announce Type: replace Abstract: We study the corruption-robustness of in-context reinforcement learning (ICRL), focusing on the Decision-Pretrained Transformer (DPT, Lee et al., 2023). To address the challenge of reward poisoning attacks targeting the DPT, we propose a novel adversarial training framework, called Adversarially Trained DPT (AT-DPT). Our method simultaneously trains a population of attackers to minimize the true reward of the DPT by poisoning environment rewards, and a DPT model to infer optimal actions from the poisoned data. We evaluate the effectiveness of our approach against standard bandit algorithms, including robust baselines designed to handle reward contamination. Our results show that AT-DPT significantly outperforms them in bandit settings under a learned attacker, and generalizes to more complex environments such as adaptive attackers and MDPs. It shows promise in ICRL as a meta-RL approach to learning effective corruption-robust algorithms.

ICRL (ORG) the Decision-Pretrained Transformer (ORG) DPT (ORG) Lee et al. (PERSON) Adversarially Trained DPT (ORG) meta-RL (ORG)

Originally published by arXiv CS Read original →

Canada announces bill banning social media for anyone under 16 The regulation also imposes new safety expectations on 'AI chatbot services.' Canada is joining Australia, Indonesia and Malaysia, in banning teenagers from using social media. The Safe Social Media Act introduced by Marc Miller, Minister of Canadian Identity and Culture, bans children under the age of 16 from having a social media account and introduces new regulatory expectations for social media services and AI platforms.

Engadget 26m ago

US charges suspected Russian hacker with facilitating cyber campaign

US charges suspected Russian hacker with facilitating cyber campaign BOSTON, June 10 : A suspected Russian hacker is now in U.S. custody following his arrest in Thailand last year and has been charged with facilitating a campaign of cyberattacks carried out by a Russia-aligned group that victimized numerous U.S. companies. Denis Obrezko, 36, made his initial appearance in federal court in Boston on Tuesday in connection with a case that U.S. authorities alleged concerned a large-scale cyber...

Channel News Asia 37m ago

OpenAI says China launched influence campaign to shape US attitudes on AI data centers

China was likely behind an online influence operation to sway U.S. perceptions of artificial intelligence technology and reshape the debate in Washington around the infrastructure needed to support it, according to research from OpenAI published Wednesday. OpenAI said it caught the influence campaign because China-backed operatives were using ChatGPT to create content for the social media campaign. The report’s findings are likely to further fuel claims made by Republicans and...

Politico EU 42m ago

US Inflation Picks Up, Eroding Paychecks | The Close 6/10/2026

Bloomberg Technology 47m ago

Robust In-Context Reinforcement Learning Under Reward Poisoning Attacks

Related Stories

Canada announces bill banning social media for anyone under 16

US charges suspected Russian hacker with facilitating cyber campaign

OpenAI says China launched influence campaign to shape US attitudes on AI data centers

US Inflation Picks Up, Eroding Paychecks | The Close 6/10/2026