Home › Knowledge Base › HBM

HBM

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Don't be so Stief! Learning KV Cache low-rank approximation over the Stiefel manifold

Announce Type: replace Abstract: Key-value (KV) caching enables fast autoregressive decoding but at long contexts becomes a dominant bottleneck in High Bandwidth Memory (HBM) capacity and bandwidth. A common mitigation is to compress cached keys and values by projecting per-head matrices to a lower rank, storing only the projections in the HBM. However, existing post-training approaches typically fit these projections using SVD-style proxy objectives, which may poorly reflect end-to-end...

arXiv CS 9d ago

SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference

arXiv:2605.18856v3 Announce Type: replace Abstract: Long-context inference is increasingly constrained by the KV cache: resident memory grows with context length, and decoding becomes limited by repeated High Bandwidth Memory (HBM) streaming rather than arithmetic. Existing methods such as eviction, windowing, quantization, and offloading reduce footprint, but often leave the critical-path bottleneck only partially addressed, especially when compressed states must still be reconstructed into...

arXiv CS 1d ago

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

arXiv:2605.30571v1 Announce Type: new Abstract: Physical AI systems, including robots, autonomous vehicles, embodied agents and edge copilots, often run a different inference workload from cloud LLM serving: single-stream, batch-1 autoregressive decode, where one robot, camera feed or user session waits on the next token. This workload is usually described as memory-bandwidth-bound. Each decode step streams model weights and the active KV cache, so latency should scale with peak HBM bandwidth.

arXiv CS 9d ago

Intel's mysterious new datacenter GPU is what Nvidia's Rubin CPX nearly was

Intel offered new insights into its next-gen datacenter GPU codenamed Crescent Island. Alongside supporting enterprise AI deployments, the GPU could fill the void left by Nvidia's Rubin CPX GPUs, which were seemingly shelved late last year following its acquisition of Groq. As datacenter GPUs go, Intel's Crescent Island is certainly an odd duck.

The Register 5d ago

Samsung, LG shares rally ahead of Nvidia CEO meetings with Korean executives

Samsung, LG shares rally ahead of Nvidia CEO meetings with Korean executives (Corrects to Huang in second reference in paragraph 4) SEOUL, June 1 : Shares in Samsung Electronics, LG Electronics and other South Korean tech firms rallied on Monday, as expected meetings between Nvidia CEO Jensen Huang and Korean executives boosted hopes of tie-ups in AI and robotics. Chipmaker Samsung Electronics was also buoyed by data that South Korea's semiconductor exports surged to a record high in May on...

Channel News Asia 9d ago

Analysis:How a nudge from Nvidia propelled frugal Micron into the AI boom and a $1 trillion market cap

Analysis:How a nudge from Nvidia propelled frugal Micron into the AI boom and a $1 trillion market cap SAN FRANCISCO, June 2 : Micron Technology's march toward a $1 trillion valuation is nothing if not dramatic: a year ago it was a little over $100 billion. That surge, though, was not built on its famed frugality, but on a nearly too-late push from Nvidia that pulled the U.S. memory chipmaker into the center of the AI boom. For decades, the Idaho-based company survived by building factories...

Channel News Asia 7d ago

How the hell is Groq raising more money?

How the hell is Groq raising more money? Somehow, Palpatine returned. Axios just dropped a bizarre scoop.

Hacker News 8d ago

Broadcom's custom ASIC biz adds South Korea's FuriosaAI to its empire

The Register 13d ago

Schedule-Level Shared-Prefix Reuse for LLM RL Training

Announce Type: replace Abstract: GRPO- and PPO-style LLM post-training commonly sample multiple trajectories from the same prompt and then train on the resulting group. In long-context RL workloads, this shared prompt-side prefix can contain retrieved passages, visual tokens, tool schemas, system instructions, or task context, while the full rollout group is still too large to pack into one training microbatch. Standard dense trainers therefore recompute the same prefix forward and backward...

arXiv CS 7d ago

Schedule-Level Shared-Prefix Reuse for LLM RL Training

Announce Type: replace Abstract: GRPO-based LLM post-training commonly samples multiple trajectories from the same prompt and then trains on the resulting group. In long-context GRPO workloads, this shared prompt-side prefix can contain retrieved passages, visual tokens, tool schemas, system instructions, or task context, while the full rollout group is still too large to pack into one training microbatch. Standard dense trainers therefore recompute the same prefix forward and backward for...

arXiv CS 6d ago