GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

arXiv CS Monday 01 June 2026, 04:00 UTC By Yuri Kuratov, Matvey Kairov, Aydar Bulatov, Ivan Rodkin, Mikhail Burtsev 1 min read

Key Points

arXiv:2603.13875v2 Announce Type: replace Abstract: Many large language model applications require conditioning on long contexts. Transformers typically support this by storing a large per-layer KV-cache of past activations, which incurs substantial memory overhead. A desirable alternative is compressive memory: read a context once, store it in a compact state, and answer many queries from that state. We study this in a context removal setting, where the model must generate an answer without access to the original context at inference time. We introduce GradMem, which writes context into memory via per-sample test-time optimization. Given a context, GradMem performs a few steps of gradient descent on a small set of prefix memory tokens while keeping model weights frozen. GradMem explicitly optimizes a model-level self-supervised context reconstruction loss, resulting in a loss-driven write operation with iterative error correction, unlike forward-only methods. On associative key--value retrieval, GradMem outperforms forward-only memory writers with the same memory size, and additional gradient steps scale capacity much more effectively than repeated forward writes. We further show that GradMem transfers beyond synthetic benchmarks: with pretrained language models, it attains competitive results on natural language tasks including bAbI and SQuAD variants, relying only on information encoded in memory.

GradMem (PERSON) KV (ORG) bAbI (LOCATION)

Originally published by arXiv CS Read original →

Drivers being urged to IGNORE sat nav instructions for 'worrying' reason Road safety charity IAM RoadSmart research has found 54% of drivers have been diverted onto rural roads because of congestion on motorways, dual carriageways and other major A roads. Drivers are being urged to consider ignoring sat nav instructions over fears they could send motorists down more dangerous roads. Road safety charity IAM RoadSmart research has found 54% of drivers have been diverted onto rural roads...

Daily Mirror 26m ago

New species found in Australia's most 'pristine' marine parks

Scientists discover 149 new marine species off Christmas and Cocos (Keeling) Islands Thu 11 Jun 2026 at 9:04am In short: Researchers say they have catalogued at least 149 new species from waters around Christmas and Cocos (Keeling) Islands.

ABC Australia 40m ago

Unix GC Remastered

Introduction The AF_UNIX garbage collector is an interesting piece of the kernel. It exists because sockets can be sent with SCM_RIGHTS but they can become unreachable from user-space while still being kept alive by the kernel, which is not memory efficient; in this situation, the garbage collector intervenes to free them. Not long ago, the subsystem was rewritten from scratch on top of a graph/Strongly-Connected-Components model; but it is still bug prone.

Hacker News 56m ago

Brother and sister from remote SA town receive King's Birthday Honours

Two siblings from the remote town of Lock on the Eyre Peninsula were surprised to see each other receive honours in the King's Birthday awards. Marie Elizabeth Shaw KC AM was honoured for her significant service to the law and community, while her brother, Michael Stephen Roberts AO was awarded for his work in pharmaceutical science research. The siblings say they are proud of each other, and that their rural upbringing had a big influence on their careers and lives.

ABC Australia 57m ago

GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

Related Stories

Drivers being urged to IGNORE sat nav instructions for 'worrying' reason

New species found in Australia's most 'pristine' marine parks

Unix GC Remastered

Brother and sister from remote SA town receive King's Birthday Honours