GPUS
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
South Korea's LG Group to adopt 10,000 Nvidia GPUs, Maeil says
South Korea's LG Group to adopt 10,000 Nvidia GPUs, Maeil says SEOUL, June 4 : South Korea's LG Group is adopting 10,000 GPUs from Nvidia, South Korea's Maeil Business Newspaper reported on Thursday citing an unnamed industry source. The GPUs are expected to be used to train AI by LG's AI research centre and a humanoid robot being developed by LG Electronics, Maeil said. A spokesperson for LG did not have an immediate comment.
Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs
Announce Type: replace Abstract: NVIDIA's CUDA Tile (CuTile) introduces a Python-based, tile-centric abstraction for GPU kernel development that aims to simplify programming while retaining Tensor Core and Tensor Memory Accelerator (TMA) efficiency on modern GPUs. We present the first independent, cross-architecture evaluation of CuTile against established approaches such as cuBLAS, Triton, WMMA, and raw SIMT on three NVIDIA GPUs spanning Hopper and Blackwell: H100 NVL, B200, and RTX PRO...
How Much Progress Has There Been in NVIDIA Datacenter GPUs?
Announce Type: replace Abstract: As the role of modern Graphics Processing Units (GPUs) becomes increasingly essential for several computing tasks, analyzing their past and current progress is paramount for determining future constraints on scientific research. This is particularly compelling in the Artificial Intelligence (AI) domain, where rapid technological advancements and fierce global competition have led the United States to recently implement export control regulations limiting...
Graph Traversal on Tensor Cores: A BFS Framework for Modern GPUs
arXiv:2606.05081v1 Announce Type: new Abstract: Modern GPUs have Tensor Cores (TCs) capable of extremely high-throughput matrix operations, yet graph algorithms remain difficult to accelerate because of their irregular and data-dependent execution patterns. This work presents BLEST, a TC-accelerated framework that reformulates Breadth-First Search (BFS) as a bit-level sparse matrix-vector computation while addressing the load imbalance, memory inefficiency, and synchronization overheads that...
Accelerating Bidiagonalization of Banded Matrices through Memory-Aware Bulge-Chasing on GPUs
arXiv:2510.12705v3 Announce Type: replace Abstract: The reduction of a banded matrix to bidiagonal form is a critical step in the calculation of Singular Values, a cornerstone of scientific computing and AI. Although inherently parallel, this step has traditionally been considered unsuitable for GPUs due to its memory-bound nature. However, recent advances in GPU architectures, such as increased L1 memory per Streaming Multiprocessor or Compute Unit and larger L2 caches, have shifted this...
FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs
arXiv:2506.01969v3 Announce Type: replace Abstract: Efficient inference of Multi-Head Latent Attention (MLA) is challenged by deploying the DeepSeek-R1 671B model on a single Multi-GPU server. This paper introduces FlashMLA-ETAP, a novel framework that enhances MLA inference for the single-instance deployment scenario on NVIDIA H20 GPUs. We propose the Efficient Transpose Attention Pipeline (ETAP), which reconfigures attention computation through transposition to align the KV context length...
South Korea to seek priority supply of Nvidia Vera Rubin GPUs, science minister says
South Korea to seek priority supply of Nvidia Vera Rubin GPUs, science minister says SEOUL, June 8 : South Korea will ask for priority supply of Nvidia's next-generation Vera Rubin graphics processing units, as deliveries are expected to be delayed, Science and ICT Minister Bae Kyung-hoon said on Monday. Bae said a government notice for South Korea's GPU project had been issued earlier in the day, adding that a supply of Nvidia's B300 chips was expected to arrive on time. "B300 supply looks...
Fast Entropy Decoding for Sparse MVM on GPUs
arXiv:2603.01915v2 Announce Type: replace Abstract: We present a novel, practical approach to speed up sparse matrix-vector multiplication (SpMVM) on GPUs. The novel key idea is to apply lossless entropy coding to further compress the sparse matrix when stored in one of the commonly supported formats. Our method is based on dtANS, our new lossless compression method that improves the entropy coding technique of asymmetric numeral systems (ANS) specifically for fast parallel GPU decoding when...
Magnum.np.distributed: Accelerating Finite Difference Micromagnetic Simulations with Multiple GPUs
Announce Type: new Abstract: Micromagnetic simulations are essential tools in nanomagnetism and spintronics research. Although widely adopted solvers like Mumax3 and the Python-native magnum.np use GPU acceleration to improve performance, these tools are limited to single-device computation. In this work, we present the first Python-native multi-GPU micromagnetic framework by extending magnum.np with PyTorch Distributed.
Parallelizing Large-Scale Tensor Network Contraction on Multiple GPUs
arXiv:2606.01852v1 Announce Type: new Abstract: Exact tensor network contraction underpins quantum circuit simulation, quantum error correction, combinatorial optimization, and many-body dynamics. The dominant parallelization strategy, slicing, scales exponentially and incurs redundant computation. We present a multi-GPU framework that instead distributes intermediate tensors across devices with explicit communication, converting a fixed contraction path into a communication-efficient...