STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Priyansh Bhatnagar, Ashkan Moradifirouzabadi, Se-Hyun Yang, SeungJae Lee, Jungwook Choi, Mingu Kang 1 min read

Key Points

arXiv:2606.08382v1 Announce Type: new Abstract: Low-rank projection has emerged as a promising approach for compressing the KV cache by exploiting hidden-dimension redundancy. However, prior methods rely on fixed or heuristic rank selection and struggle to achieve aggressive compression with minimal accuracy degradation. We propose STAR-KV, an adaptive low-rank KV cache compression framework with fine-grained rank control. STAR-KV encompasses 1) a differentiable thresholding mechanism that enables optimal rank selection at both attention-head and block levels, 2) a hybrid decomposition strategy that applies different low-rank factorizations according to the sensitivity of key and value projections, and 3) a low-rank-aware mixed precision quantization that leverages data statistics for near lossless low-bit quantization. Evaluated across multiple LLMs and benchmarks, STAR-KV achieves up to 75% KV cache compression and up to 20x overall KV cache reduction when combined with quantization. Enabled by custom Triton-based GPU kernels, STAR-KV delivers up to 6.9x speedup for the attention module and 3.1x end-to-end generation throughput. Our code is publicly available at: https://github.com/PriyanshBhatnagar/STAR-KV.

STAR-KV (ORG) Adaptive Rank Control arXiv:2606.08382v1 Announce Type: (ORG) KV (ORG) Triton (ORG) GPU (ORG)

Originally published by arXiv CS Read original →

STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control

Related Stories

Exclusive-GM may ditch LFP batteries for future EVs

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy

Whale graveyard dating back five million years discovered