Home Knowledge Base Computational Memory Systems

Computational Memory Systems

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

AXLE: Coordinated Offloading with Asynchronous Back-Streaming in Computational Memory Systems

arXiv:2512.04449v2 Announce Type: replace Abstract: CXL-based Computational Memory (CCM) enables near-memory processing within expanded remote memory, offering opportunities to address data movement costs in disaggregated memory systems and to accelerate overall performance. However, existing offloading mechanisms do not fully leverage the trade-offs of different offload models based on different CXL protocols. This work first examines these tradeoffs and their impact on end-to-end...

arXiv CS 8d ago

LIMCA: LLM for Automating Analog In-Memory Computing Architecture Design Exploration

arXiv:2503.13301v2 Announce Type: replace Abstract: Resistive crossbars enabling analog In-Memory Computing (IMC) have emerged as a promising architecture for Deep Neural Network (DNN) acceleration, offering high memory bandwidth and in-situ computation. However, the manual, knowledge-intensive design process and the lack of high-quality circuit netlists have significantly constrained design space exploration and optimization to behavioral system-level tools. In this work, we introduce...

arXiv CS 9d ago

BigMac: Breaking the Pareto Frontier of Compute and Memory in Multimodal LLM Training

Announce Type: replace Abstract: Training multimodal large language models (MLLMs) is challenged by both model and data heterogeneity. Existing systems redesign the training pipeline to address these challenges, but remain bound by a Pareto frontier between compute and memory efficiency, improving one only at the expense of the other. We present BigMac, a new training pipeline for multimodal LLMs.

arXiv CS 2d ago

In-Memory Computing Enabled Deep MIMO Detection to Support Ultra-Low-Latency Communications

arXiv:2508.17820v2 Announce Type: replace Abstract: The development of sixth-generation (6G) mobile networks imposes unprecedented latency and reliability demands on multiple-input multiple-output (MIMO) communication systems, a key enabler of high-speed radio access. Recently, deep unfolding-based detectors, which map iterative algorithms onto neural network architectures, have emerged as a promising approach, combining the strengths of model-driven and data-driven methods to achieve high...

arXiv CS 8d ago

Accuracy-Configurable Floating-Point Multiplier Design for SRAM-Based Compute-in-Memory

arXiv:2606.08430v1 Announce Type: new Abstract: Digital Compute-in-Memory (DCiM) reduces data movement and has become a promising solution for energy-efficient edge AI. However, most existing DCiM frameworks still primarily target integer or fixed-point arithmetic, and provide limited support for compiler-integrated and accuracy-configurable floating-point computation. Directly integrating conventional IEEE 754 floating-point units into dense SRAM-based DCiM arrays, however, incurs high area...

arXiv CS 1d ago

Multi-GPU Hybrid Particle-in-Cell Monte Carlo Simulations for Exascale Computing Systems

arXiv:2603.24508v3 Announce Type: replace Abstract: Particle-in-Cell (PIC) Monte Carlo (MC) simulations are central to plasma physics but face increasing challenges on heterogeneous HPC systems due to excessive data movement, synchronization overheads, and inefficient utilization of multiple accelerators. In this work, we present a portable, multi-GPU hybrid MPI+OpenMP implementation of BIT1 that enables scalable execution on both Nvidia and AMD accelerators through OpenMP target tasks with...

arXiv Physics 1d ago

Multi-GPU Hybrid Particle-in-Cell Monte Carlo Simulations for Exascale Computing Systems

arXiv:2603.24508v3 Announce Type: replace-cross Abstract: Particle-in-Cell (PIC) Monte Carlo (MC) simulations are central to plasma physics but face increasing challenges on heterogeneous HPC systems due to excessive data movement, synchronization overheads, and inefficient utilization of multiple accelerators. In this work, we present a portable, multi-GPU hybrid MPI+OpenMP implementation of BIT1 that enables scalable execution on both Nvidia and AMD accelerators through OpenMP target tasks...

arXiv CS 1d ago

Space-CIM: Enabling Compute-In-Memory Accelerators for Thermally-Constrained Space Platforms

Announce Type: new Abstract: The rapid growth in compute demand from artificial intelligence (AI) has driven a massive surge in data center construction, precipitating an energy and sustainability crisis. Motivated by the abundant solar energy in outer space and the recent sharp reduction in space launch costs, orbital data centers are emerging as a potential pathway for the future scaling of AI compute infrastructure. While the cold background in vacuum seems appealing for cooling,...

arXiv CS 5d ago

Optimal transition in underdamped systems with memory

arXiv:2605.30897v1 Announce Type: new Abstract: Optimal finite-time control is essential for energy-efficient operation of nanoscale devices. While existing work has largely focused on transitions between equilibrium states in overdamped systems, many settings of practical interest -- including nanomechanical resonators, biomolecular conformational dynamics, and quantum Brownian motion -- are governed by underdamped dynamics where both particle inertia and frequency-dependent friction...

arXiv Physics 9d ago

Distributed Persistence Domain for Persistent Memory Pooling

arXiv:2606.07159v1 Announce Type: new Abstract: Compute Express Link (CXL) enables memory pooling over disaggregated memory, offering the potential to improve resource utilization in persistent memory systems. However, integrating persistence semantics into CXL-based memory pooling introduces substantial latency, which limits system scalability. This overhead arises because persist operations must traverse the entire CXL fabric, including switches, links, and protocol layers, before reaching...

arXiv CS 2d ago