GPTVQ
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization
arXiv:2605.08692v2 Announce Type: replace Abstract: Post-training weight-only quantization to 4 bits is widely used to reduce the memory and compute costs of large language model inference. Existing PTQ methods, such as AWQ and GPTQ, improve how weights are mapped onto a fixed 4-bit grid through scaling, clipping, or error compensation. To further improve accuracy, methods such as OmniQuant and QuIP\# uses gradient-assisted algorithms at the cost of hours of quantization time.