Home Knowledge Base H20

H20

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs

arXiv:2506.01969v3 Announce Type: replace Abstract: Efficient inference of Multi-Head Latent Attention (MLA) is challenged by deploying the DeepSeek-R1 671B model on a single Multi-GPU server. This paper introduces FlashMLA-ETAP, a novel framework that enhances MLA inference for the single-instance deployment scenario on NVIDIA H20 GPUs. We propose the Efficient Transpose Attention Pipeline (ETAP), which reconfigures attention computation through transposition to align the KV context length...

arXiv CS 7d ago

US says ban on AI chip shipments applies to Chinese firms outside China

US says ban on AI chip shipments applies to Chinese firms outside China Department of Commerce issues guidance on chip restrictions amid concerns about loopholes in export control regime. The United States has issued a notice affirming its restrictions on shipments of semiconductors to subsidiaries of Chinese companies located outside China amid concerns about loopholes in Washington’s export control regime. The Department of Commerce said in the guidance issued on Sunday that its licensing...

Al Jazeera 9d ago

FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication

arXiv:2605.06057v3 Announce Type: replace Abstract: Peak breaking Matrix Multiplication is a promising technique to improve the performance of DL, especially in LLM training and inference. We present FalconGEMM, a cross-platform framework that automates the deployment, optimization, and selection of Lower-Complexity Matrix Multiplication Algorithms (LCMAs) across diverse hardware. There are three key innovations: (1) a Deployment Module that enables portable execution across various hardware...

arXiv CS 1d ago