Home Knowledge Base IBGDA

IBGDA

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

Announce Type: new Abstract: Frontier LLMs increasingly decide what a query attends to with a sparse-attention indexer that picks a few KV-cache blocks per query: attention's unit is now a small, reusable chunk. Agentic workloads hammer it: many sub-agents query one large codebase, reusing the same blocks. When that corpus outgrows one GPU it is partitioned across instances, so a query and the blocks it selects often sit on different GPUs: answering it means attention across instances.

arXiv CS 8d ago