HBM
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Don't be so Stief! Learning KV Cache low-rank approximation over the Stiefel manifold
Announce Type: replace Abstract: Key-value (KV) caching enables fast autoregressive decoding but at long contexts becomes a dominant bottleneck in High Bandwidth Memory (HBM) capacity and bandwidth. A common mitigation is to compress cached keys and values by projecting per-head matrices to a lower rank, storing only the projections in the HBM. However, existing post-training approaches typically fit these projections using SVD-style proxy objectives, which may poorly reflect end-to-end...
SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference
arXiv:2605.18856v3 Announce Type: replace Abstract: Long-context inference is increasingly constrained by the KV cache: resident memory grows with context length, and decoding becomes limited by repeated High Bandwidth Memory (HBM) streaming rather than arithmetic. Existing methods such as eviction, windowing, quantization, and offloading reduce footprint, but often leave the critical-path bottleneck only partially addressed, especially when compressed states must still be reconstructed into...
Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode
arXiv:2605.30571v1 Announce Type: new Abstract: Physical AI systems, including robots, autonomous vehicles, embodied agents and edge copilots, often run a different inference workload from cloud LLM serving: single-stream, batch-1 autoregressive decode, where one robot, camera feed or user session waits on the next token. This workload is usually described as memory-bandwidth-bound. Each decode step streams model weights and the active KV cache, so latency should scale with peak HBM bandwidth.
Intel's mysterious new datacenter GPU is what Nvidia's Rubin CPX nearly was
Intel offered new insights into its next-gen datacenter GPU codenamed Crescent Island. Alongside supporting enterprise AI deployments, the GPU could fill the void left by Nvidia's Rubin CPX GPUs, which were seemingly shelved late last year following its acquisition of Groq. As datacenter GPUs go, Intel's Crescent Island is certainly an odd duck.
Samsung, LG shares rally ahead of Nvidia CEO meetings with Korean executives
Samsung, LG shares rally ahead of Nvidia CEO meetings with Korean executives (Corrects to Huang in second reference in paragraph 4) SEOUL, June 1 : Shares in Samsung Electronics, LG Electronics and other South Korean tech firms rallied on Monday, as expected meetings between Nvidia CEO Jensen Huang and Korean executives boosted hopes of tie-ups in AI and robotics. Chipmaker Samsung Electronics was also buoyed by data that South Korea's semiconductor exports surged to a record high in May on...
Analysis:How a nudge from Nvidia propelled frugal Micron into the AI boom and a $1 trillion market cap
Analysis:How a nudge from Nvidia propelled frugal Micron into the AI boom and a $1 trillion market cap SAN FRANCISCO, June 2 : Micron Technology's march toward a $1 trillion valuation is nothing if not dramatic: a year ago it was a little over $100 billion. That surge, though, was not built on its famed frugality, but on a nearly too-late push from Nvidia that pulled the U.S. memory chipmaker into the center of the AI boom. For decades, the Idaho-based company survived by building factories...
How the hell is Groq raising more money?
How the hell is Groq raising more money? Somehow, Palpatine returned. Axios just dropped a bizarre scoop.
Schedule-Level Shared-Prefix Reuse for LLM RL Training
Announce Type: replace Abstract: GRPO- and PPO-style LLM post-training commonly sample multiple trajectories from the same prompt and then train on the resulting group. In long-context RL workloads, this shared prompt-side prefix can contain retrieved passages, visual tokens, tool schemas, system instructions, or task context, while the full rollout group is still too large to pack into one training microbatch. Standard dense trainers therefore recompute the same prefix forward and backward...
Schedule-Level Shared-Prefix Reuse for LLM RL Training
Announce Type: replace Abstract: GRPO-based LLM post-training commonly samples multiple trajectories from the same prompt and then trains on the resulting group. In long-context GRPO workloads, this shared prompt-side prefix can contain retrieved passages, visual tokens, tool schemas, system instructions, or task context, while the full rollout group is still too large to pack into one training microbatch. Standard dense trainers therefore recompute the same prefix forward and backward for...