Home Knowledge Base GEMV

GEMV

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

RH+: Row-Hit-Optimized Scheduling for PIM-based LLM Inference

arXiv:2606.05511v1 Announce Type: new Abstract: Large language model inference on processing-in-memory (PIM) architectures promises to break the memory wall by performing multiply-accumulate (MAC) operations directly within HBM3 DRAM banks. Prior work identifies the power constraint timing parameter nCCDAB as the primary performance bottleneck and optimizes scheduling accordingly. We demonstrate that for GEMV operations that dominate autoregressive decoding, the DRAM row cycle time (nRC) is...

arXiv CS 5d ago

FP8 is All You Need (Part 1): Debunking Hardware FP64 as the HPC Holy Grail

arXiv:2606.06510v1 Announce Type: new Abstract: Conventional HPC dogma holds that native hardware FP64 silicon is the irreducible foundation of scientific computing -- the "holy grail" of double-precision simulation. This paper argues the dogma is wrong: on AI-optimised GPUs of the B300 generation and beyond, abundant FP8 tensor throughput combined with the Chinese Remainder Theorem-based Ozaki Scheme II recovers memory-roof execution at full FP64 accuracy across the canonical HPC kernel...

arXiv CS 2d ago