Home Knowledge Base intra-SM

intra-SM

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

arXiv:2606.08761v1 Announce Type: new Abstract: W4A4 quantization promises full utilization of INT4 Tensor Cores, yet group dequantization overhead on CUDA Cores has driven existing systems to mixed-precision fallbacks. We present the first systematic study of how intra-SM compute balance governs this bottleneck. Through controlled benchmarks across four GPUs from Ampere and Ada architectures, we identify the Tensor Cores to CUDA Cores throughput ratio ($\rho$) as the primary hardware...

arXiv CS 1d ago