Home Knowledge Base Cross-Layer Sparse Attention

Cross-Layer Sparse Attention

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

Announce Type: new Abstract: Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse methods typically provide stronger acceleration but incur noticeable quality loss, while token sparse methods are usually more accurate yet deliver limited end-to-end...

arXiv CS 5d ago