Home › Knowledge Base › CUDA Accelerator

CUDA Accelerator

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Caspar: CUDA Accelerator for Symbolic Programming with Adaptive Reordering

arXiv:2605.30583v1 Announce Type: new Abstract: We present Caspar, a library that makes the power of modern GPUs more accessible in robotics and provides a state-of-the-art nonlinear GPU solver that can be applied to a wide range of different optimization problems. Caspar bridges the gap between expressive symbolic programming in Python and high-performance GPU runtimes in C++ by automatically generating optimized CUDA kernels from symbolic expressions. Building on the SymForce library,...

arXiv CS 9d ago

Nvidia RTX Spark

RTX Spark Superchip Up to Blackwell RTX GPU Up to Ultra-Efficient CPU Up to FP4 AI Performance Up to Unified Memory CUDA, the software that accelerates the world’s AI, runs natively on RTX Spark.

Hacker News 9d ago

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

arXiv:2606.04023v1 Announce Type: new Abstract: While large language models (LLMs) have been extensively evaluated on code generation tasks for general-purpose programming and GPU-accelerated environments (e.g., PyTorch, CUDA), their capabilities in CPU-oriented high-performance computing (HPC) across diverse architectures remain underexplored. To bridge this gap, we introduce CodegenBench, a comprehensive benchmark suite designed to evaluate the generation of efficient parallel code across...

arXiv CS 6d ago

HASTE: Hardware-Aware Dynamic Sparse Training for Large Output Spaces

Announce Type: new Abstract: Extreme multi-label classification (XMC) involves learning models over large output spaces with millions of labels, making the output layer a memory-compute bottleneck. While sparsity-based methods reduce arithmetic complexity, they often fail to yield proportional speedups due to irregular memory access, poor hardware utilization, or reliance on auxiliary architectural components in long-tailed regimes.

arXiv CS 8d ago

UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

arXiv:2605.30313v2 Announce Type: replace Abstract: Simulation-based RL for contemporary robot control is increasingly organized around GPU-resident simulation: physics, rollout collection, and learning are placed on a single GPU-centric execution path. This paradigm has greatly improved training speed, but it has also encouraged a default assumption that efficient training requires physics to reside on the GPU. We revisit this assumption.

arXiv CS 9d ago

UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

arXiv:2605.30313v3 Announce Type: replace Abstract: Simulation-based RL for contemporary robot control is increasingly organized around GPU-resident simulation: physics, rollout collection, and learning are placed on a single GPU-centric execution path. This paradigm has greatly improved training speed, but it has also encouraged a default assumption that efficient training requires physics to reside on the GPU.

arXiv CS 7d ago

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Announce Type: replace Abstract: NVIDIA's CUDA Tile (CuTile) introduces a Python-based, tile-centric abstraction for GPU kernel development that aims to simplify programming while retaining Tensor Core and Tensor Memory Accelerator (TMA) efficiency on modern GPUs. We present the first independent, cross-architecture evaluation of CuTile against established approaches such as cuBLAS, Triton, WMMA, and raw SIMT on three NVIDIA GPUs spanning Hopper and Blackwell: H100 NVL, B200, and RTX PRO...

arXiv CS 5d ago

CNBC's The China Connection newsletter: China learns to build without Nvidia

Hi, this is Evelyn, writing to you from Beijing. Welcome to the latest edition of The China Connection — a succinct snapshot of what I'm seeing and hearing from local businesses. China's tech self-sufficiency push is rapidly becoming a reality as companies focus on business questions that run deeper than geopolitics.

CNBC 8d ago

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

Announce Type: replace Abstract: Diffusion Transformers (DiT) achieve strong performance in image generation but incur substantial inference costs. While prior work has reduced this cost via quantization and distillation, semi-structured sparsity, which can nearly halve FLOPs, remains underexplored. A key reason is that most existing approaches focus on weight sparsification, and pruning 50% of the weights can remove critical model capacity and degrade generation quality.

arXiv CS 8d ago

AtlasRAN: Timing-Aware Evaluation of Open-source 5G Platforms for Integrated Wireless Testbeds

Announce Type: replace Abstract: Open-source 5G and O-RAN experimentation now spans discrete-event simulators, host-OS emulators, SDR hardware-in-the-loop testbeds, O-RU/Open Fronthaul deployments, wireless digital twins, and accelerator-backed RAN runtimes. These environments may expose similar protocol interfaces while preserving very different timing, I/O, synchronization, buffering, transport, and observability behavior. Thus, studies that appear to measure the same network property may...

arXiv CS 9d ago