MultiTurnPSB: Evaluating Multi-Turn Jailbreak Attacks an dClassifier-Based Defenses for Medical AI Safety

arXiv CS Wednesday 03 June 2026, 03:43 UTC By Anushka Sheoran, Yiduo Hao 1 min read

Key Points

arXiv:2606.02630v1 Announce Type: new Abstract: Patient-facing medical chatbots are commonly evaluated on single-turn prompts, yet real users push back after refusals, add urgency, and invoke authority. We introduce MultiTurnPSB, a four-turn adversarial extension of PatientSafetyBench, and evaluate GPT-4.1-mini under fixed template, template-adaptive, and live adversarial attacks. Unsafe responses rise from 35% to nearly 80% by Turn 4 under live attack. Under the same adversary, GPT-4.1-mini and Claude Sonnet 4.5 are statistically indistinguishable at baseline but diverge to a 19x gap by Turn 4, a difference invisible to single-turn evaluation. We characterize four degradation trajectory signatures and identify a two-element attack formula responsible for most catastrophic failures. A lightweight input-side classifier reduces Turn 4 unsafe responses by 52 percentage points despite severe accuracy degradation, but the 45% false alarm rate on benign queries is the primary deployment constraint. A methodological finding also emerges: Claude Sonnet refused to generate adversarial messages in over half of late-turn conversations despite explicit red team framing, suggesting safety training may generalize to the attacker role.

MultiTurnPSB (PERSON) Medical AI Safety arXiv:2606.02630v1 (ORG) PatientSafetyBench (ORG) Claude Sonnet 4.5 (PERSON) Turn 4 (EVENT) Claude Sonnet (PERSON)

Originally published by arXiv CS Read original →

INDIANAPOLIS -- Colts wide receiver Alec Pierce, fresh off his four-year $114 million contract extension, still faces many weeks of recovery from left ankle surgery he underwent in March, he said Wednesday. The return timeline, which Pierce said can range from four to six months, could keep him out well into training camp and past the preseason. Pierce tried to avoid surgery, but consistent soreness in his ankle last season strongly suggested the issue needed to be addressed, he said.

ESPN just now

Family of Belfast knife horror issue second appeal for calm amid fears he could lose other eye

Family of Belfast knife horror issue second appeal for calm amid fears he could lose other eye Stephen Ogilvie is in an induced coma as doctors battle to save his remaining eye while his family appeal for calm after footage of the horrific attack sparks street violence The family of the man who lost an eye in a knife attack in Belfast have called for calm saying they do not want it to "divide people or fuel hostility". Cars, houses and a bus were torched by gangs of masked demonstrators...

Daily Mirror 49m ago

'Doctor Who' Christmas Special is cancelled as Russell T. Davies departs the show: What does this mean for the future of Doctor Who?

'Doctor Who' Christmas Special is cancelled as Russell T. Davies departs the show: What does this mean for the future of Doctor Who? The Whos in Whoville won’t have a very merry Christmas this year. The BBC has announced that the planned "Doctor Who" 2026 Christmas special has been cancelled and that the show’s showrunner, Russell T. Davis, is departing alongside his production company Bad Wolf.

Space.com 1h ago

Gurgaon road rage: Doctor, her husband attacked during confrontation; one arrested

A female doctor was allegedly harassed, abused, threatened, and her car was damaged by two men during an incident in Gurgaon’s South City-II area on Tuesday evening. Police have arrested one person in connection with the case and are making efforts to trace the second accused. The incident took place around 4 pm when the woman, a resident of Sohna, had come to the South City-II area for some work.

Times of India 1h ago

MultiTurnPSB: Evaluating Multi-Turn Jailbreak Attacks an dClassifier-Based Defenses for Medical AI Safety

Related Stories

Colts' Pierce could be out well into training camp...

Family of Belfast knife horror issue second appeal for calm amid fears he could lose other eye

'Doctor Who' Christmas Special is cancelled as Russell T. Davies departs the show: What does this mean for the future of Doctor Who?

Gurgaon road rage: Doctor, her husband attacked during confrontation; one arrested