Home Knowledge Base HPC

HPC

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

FP8 is All You Need (Part 1): Debunking Hardware FP64 as the HPC Holy Grail

arXiv:2606.06510v1 Announce Type: new Abstract: Conventional HPC dogma holds that native hardware FP64 silicon is the irreducible foundation of scientific computing -- the "holy grail" of double-precision simulation. This paper argues the dogma is wrong: on AI-optimised GPUs of the B300 generation and beyond, abundant FP8 tensor throughput combined with the Chinese Remainder Theorem-based Ozaki Scheme II recovers memory-roof execution at full FP64 accuracy across the canonical HPC kernel...

arXiv CS 2d ago

When More Cores Hurts: The Vector Database Scaling Paradox in HPC

Announce Type: new Abstract: Vector databases have been designed and optimized for cloud environments; however, emerging scientific AI workloads (e.g., molecular search, meteorological trajectory detection, and literature-driven hypothesis generation) demand efficient, scalable execution on HPC systems. We present a large-scale evaluation of three state-of-the-art vector databases -- Qdrant, Milvus, and Weaviate -- on two production supercomputers, scaling to 256 distributed workers across...

arXiv CS 1d ago

Twelve quick tips for designing AI-driven HPC workflows

Announce Type: new Abstract: High-performance computing (HPC) clusters remain the backbone of large-scale scientific computation, traditionally executing deterministic, linear pipelines optimised for predictable performance. However, the pervasive integration of artificial intelligence (AI) and foundation models into scientific research has introduced a fundamentally new computational paradigm. AI-driven workflows are characteristically iterative, data-driven, and probabilistic, introducing...

arXiv CS 2d ago

FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

arXiv:2605.02125v3 Announce Type: replace Abstract: Federated learning (FL) across multiple HPC facilities faces stochastic admission delays from batch schedulers that dominate wall-clock time. Synchronous FL suffers from severe stragglers, while asynchronous FL accumulates stale updates when queues spike. We propose FedQueue, a queue-aware FL protocol that incorporates scheduler delays directly into training and aggregation, which (i) predicts per-facility queue delays online to budget...

arXiv CS 9d ago

Unifying von-Neumann HPC and Neuromorphic Acceleration via the EBRAINS Research Infrastructure: A Framework for High-Performance Workflows

Announce Type: new Abstract: Modern scientific workflows increasingly span diverse computing architectures, yet executing a single computational model across disparate systems often forces researchers to maintain fragmented, site-specific pipelines. In this paper, we address this challenge within the domain of computational neuroscience by presenting a unified, cloud-based workflow orchestrated via EBRAINS JupyterLab. This workflow enables users to transparently execute spiking neural...

arXiv CS 1d ago

Concepts in Practice: C++ MPI Bindings for the HPC Ecosystem. From a Standardizable Core to a Composable Interface

arXiv:2606.09102v1 Announce Type: new Abstract: The official C++ MPI bindings were removed from the standard in 2008, leaving a gap that numerous third-party libraries have attempted to fill. However, existing wrappers typically cover only a limited subset of MPI or target specific use cases, falling short of a general-purpose solution.

arXiv CS 1d ago

Structuring agentic AI for HPC code modernization

arXiv:2606.08710v1 Announce Type: new Abstract: Modernization of legacy scientific codes is often necessary to keep up with the ever-evolving changes in the compute resource ecosystem. Parallelization and migration from poorly supported software ecosystems are two of the most time-consuming activities in the research software engineering field. This paper presents our experience in the successful, two-phase AI-assisted modernization of NMAP-RKPM, a roughly 60,000-line, 3D explicit solid...

arXiv CS 1d ago

LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization

arXiv:2606.06063v1 Announce Type: new Abstract: When porting high-performance computing (HPC) code from CPU to GPU, CPU-oriented optimizations may obstruct LLM-based CUDA translation. We design and evaluate a Deopt-Reopt workflow that first simplifies the input C++ code and then retranslates and reoptimizes it for CUDA, comparing it against direct translation (Direct) on twelve HPC kernels with two LLMs (gpt-oss-120b (O120) and qwen-3-235b-a22b-instruct-2507 (Q235)) in Single-shot (one pass)...

arXiv CS 5d ago

Strategies for Molecular Dynamics using Hybrid Systems: LAMMPS Use Case

arXiv:2606.02319v1 Announce Type: new Abstract: The complexity of biomolecular simulations has substantially increased the demand for High-Performance Computing (HPC) infrastructures, particularly in molecular dynamics and coarse-grained modeling. This work presents a systematic performance and scalability analysis of the LAMMPS simulator for coarse-grained biomolecular simulations, using the antimicrobial peptide Tritrpticin (PDB ID: 1D6X) as the experimental workload. Pure MPI and hybrid...

arXiv CS 8d ago

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

arXiv:2606.04023v1 Announce Type: new Abstract: While large language models (LLMs) have been extensively evaluated on code generation tasks for general-purpose programming and GPU-accelerated environments (e.g., PyTorch, CUDA), their capabilities in CPU-oriented high-performance computing (HPC) across diverse architectures remain underexplored. To bridge this gap, we introduce CodegenBench, a comprehensive benchmark suite designed to evaluate the generation of efficient parallel code across...

arXiv CS 6d ago