Home Knowledge Base IntAttention

IntAttention

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference

arXiv:2511.21513v2 Announce Type: replace Abstract: Deploying Transformer models on edge devices is limited by latency and energy budgets. While INT8 quantization effectively accelerates the primary matrix multiplications, it exposes the softmax-related path as the dominant bottleneck. This stage incurs a costly dequantize -> softmax -> requantize detour, which can account for up to 65% of total attention latency and disrupts the end-to-end integer dataflow critical for edge hardware efficiency.

arXiv CS 9d ago