Home Knowledge Base StiefAttention

StiefAttention

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Don't be so Stief! Learning KV Cache low-rank approximation over the Stiefel manifold

Announce Type: replace Abstract: Key-value (KV) caching enables fast autoregressive decoding but at long contexts becomes a dominant bottleneck in High Bandwidth Memory (HBM) capacity and bandwidth. A common mitigation is to compress cached keys and values by projecting per-head matrices to a lower rank, storing only the projections in the HBM. However, existing post-training approaches typically fit these projections using SVD-style proxy objectives, which may poorly reflect end-to-end...

arXiv CS 9d ago