Unsupervised Partner Design Enables Robust Ad-hoc Teamwork

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Constantin Ruhdorfer, Matteo Bortoletto, Victor Oei, Anna Penzkofer, Andreas Bulling 1 min read

Key Points

arXiv:2508.06336v2 Announce Type: replace Abstract: We introduce Unsupervised Partner Design (UPD), a population-free multi-agent reinforcement learning method for robust ad-hoc teamwork. UPD generates training partners on-the-fly and selects them adaptively based on a learnability criterion, removing the need for pre-trained partner populations or manual parameter tuning. We show that this simple mechanism enables effective partner diversity and can be extended to joint partner-environment selection when a procedural level generator is available. Across Level-Based Foraging, Overcooked-AI, and the Overcooked Generalisation Challenge, UPD consistently achieves strong performance compared to both population-based and population-free baselines. In a human-AI user study, agents trained with UPD achieve higher returns and are rated as more adaptive, more human-like, and less frustrating than all evaluated baseline methods.

Unsupervised Partner Design Enables Robust Ad-hoc (ORG) Teamwork arXiv:2508.06336v2 Announce Type: (ORG) Unsupervised Partner Design (ORG) the Overcooked Generalisation Challenge (ORG) UPD (ORG)

Originally published by arXiv CS Read original →

Unsupervised Partner Design Enables Robust Ad-hoc Teamwork

Related Stories

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy

Whale graveyard dating back five million years discovered

Whale graveyard dating back five million years discovered

SpaceX Leaves Some Banks Peeved at Junior Roles in IPO Lineup