Can the Environment Speak for Itself? $T^{2}$-GRPO: A Turn-Trajectory Group Relative Policy Optimization for Caregiver Agents

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Yutong Song, Jiang Wu, Pengfei Zhang, Wenjun Huang, Honghui Xu, Nikil Dutt, Amir M. Rahmani 1 min read

Key Points

arXiv:2606.08875v1 Announce Type: new Abstract: Optimizing large language models (LLMs) for long-horizon caregiver agents requires balancing delayed task objectives with immediate environment dynamics, such as patient distress and resistance. In dementia care, this balance is especially difficult: trajectory level rewards are too sparse for turn level credit assignment, while external LLM-based evaluators are costly and can misread fragmented or indirect patient responses. To address this issue, we propose \textbf{T}urn-\textbf{T}rajectory \textbf{G}roup \textbf{R}elative \textbf{P}olicy \textbf{O}ptimization (\textbf{T$^{2}$-GRPO}), a framework that decouples caregiver RL into two normalized reward horizons and enforces safety through a binary hard veto. $T^2$-GRPO derives dense turn-level rewards directly from environment state transitions, measuring changes in patient distress and resistance from a frozen dementia patient simulator. These environment-grounded rewards are combined with trajectory-level evaluations through independent centered-rank normalization, which preserves heterogeneous reward signals and mitigates reward collapse. Extensive experiments on dementia caregivers show that T $^{2}$-GRPO outperforms competitive baselines, indicating a substantial improvement for emotionally sensitive caregiver scenarios that effectively handles immediate patient feedback, long-term care outcomes, and safety constraints.

LLM (ORG) RL (ORG)

Originally published by arXiv CS Read original →

The seven-time felon accused of gunning down a Chicago police officer returned to court Wednesday to enter a plea in the fatal shooting. Alphanso Talley, 27, pleaded not guilty to charges stemming from the fatal shooting of Chicago Police Department (CPD) Officer John Bartholomew, 28, inside an Illinois courtroom Wednesday morning. Bartholomew’s loved ones were in attendance at the arraignment, along with Talley’s mother and another member of his family who were seated in the front row.

Fox News 29m ago

Victorian town fights high suicide rates with mental health first aid

Victorian town fights against high suicide rates with community-led prevention training Thu 11 Jun 2026 at 5:25am In short: Residents in Portland are crowdfunding suicide prevention training to address high suicide rates in the region. The project is led by people with lived experience of suicide to break down mental health stigma in regional and rural communities. The project aims to reach 10 per cent of working-age adults in the region before 2029.

ABC Australia 32m ago

Air India crash sole survivor Viswash still battling physical, mental and financial woes

TOI Correspondent from London: A year after limping out shell-shocked from the burning wreckage of the Air India Dreamliner (AI 171), the crash’s sole survivor, Viswashkumar Ramesh, is still struggling with physical pain, bereavement, financial difficulties and psychological problems. After the crash, Viswashkumar (39) had returned to the UK on Sept 15 to be with his son, Divang, now five, who was starting school for the first time, and his wife, Hiral, and also to get medical treatment. The...

Times of India 54m ago

What happens when a gas company abandons an entire city?

Many households want to electrify. But who pays when the gas utility walks away? Jun 2026 at 4:56am In the expanse of his restaurant's kitchen, Les Palmer fixes a trained eye on the grill.

ABC Australia 1h ago

Can the Environment Speak for Itself? $T^{2}$-GRPO: A Turn-Trajectory Group Relative Policy Optimization for Caregiver Agents

Related Stories

Chicago cop killing suspect calls for comfort for mom after entering plea as slain officer’s family watches

Victorian town fights high suicide rates with mental health first aid

Air India crash sole survivor Viswash still battling physical, mental and financial woes

What happens when a gas company abandons an entire city?