Home Knowledge Base SM

SM

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

arXiv:2606.08761v1 Announce Type: new Abstract: W4A4 quantization promises full utilization of INT4 Tensor Cores, yet group dequantization overhead on CUDA Cores has driven existing systems to mixed-precision fallbacks. We present the first systematic study of how intra-SM compute balance governs this bottleneck. Through controlled benchmarks across four GPUs from Ampere and Ada architectures, we identify the Tensor Cores to CUDA Cores throughput ratio ($\rho$) as the primary hardware...

arXiv CS 1d ago

An Efficient, Reliable and Observable Collective Communication Library in Large-scale GPU Training Clusters

Announce Type: replace Abstract: Large-scale LLM training requires collective communication libraries to exchange data among distributed GPUs. As a company dedicated to building and operating large-scale GPU training clusters, we encounter several practical limitations of NCCL in production, including 1) SM competition between computation and communication, 2) expensive restart costs under link failures, and 3) insufficient observability of transient collective communication anomalies. To...

arXiv CS 8d ago

Can You Stop a Hypersonic Missile?

Can You Stop a Hypersonic Missile? The headlines say yes. Patriot crews shot down a Kinzhal over Kyiv on the night of May 4, 2023.

Hacker News 9d ago

A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)

A 10 year old Xeon is all you need 17 minutes read The previous post covered getting Gemma 4’s MTP drafters quantized and paired with a verifier. This one is about running the result on a machine that has no business running it. I have a recycled server.

Hacker News 9d ago

DuetServe: Harmonizing Prefill and Decode for LLM Serving via Adaptive GPU Multiplexing

arXiv:2511.04791v2 Announce Type: replace Abstract: Modern LLM serving systems must sustain high throughput while meeting strict latency SLOs across two distinct inference phases: compute-intensive prefill and memory-bound decode phases. Existing approaches either (1) aggregate both phases on shared GPUs, leading to interference between prefill and decode phases, which degrades Time-Between-Tokens (TBT); or (2) disaggregate the two phases across GPUs, improving latency but wasting resources...

arXiv CS 8d ago

Gooey: A GPU-accelerated UI framework for Zig

A GPU-accelerated UI framework for Zig, targeting macOS (Metal), Linux (Vulkan/Wayland), and Browser (WASM/WebGPU). Join the Gooey discord Early Development: API is evolving. Example app built with Gooey — chat-zig, an Anthropic Claude client using the Zig 0.16 std.

Hacker News 7d ago

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

arXiv:2606.09682v1 Announce Type: new Abstract: AutoMegaKernel (AMK) compiles a HuggingFace Llama-family model into a single persistent cooperative CUDA kernel that runs the whole forward pass in one launch, with no per-model hand-written CUDA. The contribution is the system, not raw speed. A frozen schedule-IR validator statically certifies deadlock-freedom and race-freedom via static graph checks (not a mechanized proof), so an unsafe agent-proposed schedule is rejected before launch:...

arXiv CS 1d ago

AT&T Promo Codes: $50 Off This June 2026

Major wireless carriers: A necessary evil if you travel a lot, have a family, or are just interested in coverage that’s reliably consistent and widespread. AT&T is the third-largest provider in the US (first for 5G), with the largest coverage map. I’ve had various AT&T plans for more than a decade, first for just myself and now for my whole family, even though I only get one cell bar at my house and have to stand in one 5-square-foot patch of yard to make a phone call.

Wired 7d ago

For satellites as small as a briefcase, getting around in space just got a whole lot easier

For satellites as small as a briefcase, getting around in space just got a whole lot easier Lisa Lock Scientific Editor Robert Egan Associate Editor MIT engineers are testing a new propulsion system that combines the power and speed of conventional chemical thrusters with the precision and fuel-efficiency of electrical thrusters. The system could enable the design of nimbler, more flexible small satellites, which could perform both fast, powerful maneuvers and slower, precise adjustments,...

Phys.org 8d ago

The American Missile Crisis

Recent global conflicts, from Russia and Ukraine to Iran and Israel, have seen a resurgent awareness of the frailty of US munitions stock, which has been drawn down by both direct and indirect involvement in these events. While exact stockpile volumes are not disclosed, it is estimated that supplies of US warheads and the missiles that carry them have declined by nearly an order of magnitude since their peak during the Cuban Missile Crisis. Analysts have estimated that in the event of a...

Hacker News 7d ago