Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

arXiv CS Friday 05 June 2026, 04:00 UTC By Ryan Solgi, Parsa Madinei, Jiayi Tian, Rupak Swaminathan, Jing Liu, Nathan Susanj, Zheng Zhang 1 min read

Key Points

Announce Type: replace Abstract: Large language models (LLM) and vision-language models (VLM) have achieved state-of-the-art performance, but they impose significant memory and computing challenges in deployment. We present a novel low-rank compression framework to address this challenge. First, we upper bound the change of network loss via layer-wise activation-based compression errors, filling a theoretical gap in the literature.

arXiv:2510.05544v2 Announce Type: replace Abstract: Large language models (LLM) and vision-language models (VLM) have achieved state-of-the-art performance, but they impose significant memory and computing challenges in deployment. We present a novel low-rank compression framework to address this challenge. First, we upper bound the change of network loss via layer-wise activation-based compression errors, filling a theoretical gap in the literature. We then formulate low-rank model compression as a bi-objective optimization and prove that a single uniform tolerance yields surrogate Pareto-optimal heterogeneous ranks. Based on our theoretical insights, we propose Pareto-Guided Singular Value Decomposition (PGSVD), a zero-shot pipeline that improves activation-aware compression via Pareto-guided rank selection and alternating least-squares implementation. We apply PGSVD to both LLM and VLM, showing better accuracy at the same compression levels and inference speedup.

LLM (ORG) VLM (ORG)

Originally published by arXiv CS Read original →

Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

Related Stories

What's gone wrong for the Cubs -- and if they can ...

We’re All on Starship Elon Now

We’re All on Starship Elon Now

Jeff Bezos’s Blue Origin says it will fly again this year after explosion. Nasa needs it to