Home Knowledge Base W4A4KV4

W4A4KV4

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

OffQ: Taming Structured Outliers in LLM Quantization by Offsetting

Announce Type: new Abstract: Low-bit quantization has been widely adopted to accelerate the inference of large language models (LLMs) by significantly reducing computational cost and memory usage. However, activation outliers pose a major challenge to effective quantization, often leading to notable performance degradation. In this paper, we introduce OffQ, a method designed to mitigate activation outliers in low-bit quantization through a novel offsetting mechanism.

arXiv CS 2d ago