Home › Knowledge Base › Dataflow

Dataflow

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Teaching Synchronous Dataflow Modelling with Learn-Heptagon

arXiv:2606.01928v1 Announce Type: new Abstract: Lustre is a synchronous dataflow language designed to implement safety-critical embedded software. In addition to writing executable programs, the language doubles as a program logic, used for writing specification as synchronous observers or assume-guarantee contracts that specify properties of these programs.

arXiv CS 8d ago

DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs

Announce Type: replace Abstract: We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to three key limitations: (1) reliance on bulk-synchronous systems like TensorFlow, which under-utilize devices due to barrier synchronization; (2) lack of awareness of the scheduling mechanism of underlying systems when designing...

arXiv CS 2d ago

DxPTA: An Architecture Design Space Exploration with Optical Dataflow-guided Strategy for HW/SW Co-Design of Photonic Transformer Accelerators

arXiv:2606.06515v1 Announce Type: new Abstract: Transformer-based networks have emerged as prominent AI models with state-of-the-art performance, which potentially pave the way toward artificial general intelligence (AGI). However, their large sizes still hinder their efficient implementation, thus highlighting the need for alternate solutions to enable their energy-efficient acceleration. Recently, state-of-the-art works propose photonic transformer accelerators (PTAs) with significant...

arXiv CS 2d ago

Dependencies and Dataflow in Seed-Filter-Extend Pipelines

Announce Type: new Abstract: Comparing genomes is critical for discovering mutations, tracking evolutionary lineages, and advancing cross-species genomics. Fundamentally, this reduces to an O(n^2) string-matching dynamic programming (DP) problem, a challenge that has driven decades of performance research. However, executing a strict O(n^2) DP algorithm is computationally intractable for genomes spanning millions to billions of base pairs.

arXiv CS 2d ago

LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

Announce Type: replace Abstract: Unstructured sparsity is now natively accelerated by recent GPU kernels and dataflow hardware, shifting the bottleneck from inference execution to the pruning algorithm. State-of-the-art methods for unstructured LLM pruning are layer-wise surrogates derived from the Optimal Brain Surgeon principle, and they sacrifice end-to-end accuracy, especially under aggressive sparsity.

arXiv CS 1d ago

IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference

arXiv:2511.21513v2 Announce Type: replace Abstract: Deploying Transformer models on edge devices is limited by latency and energy budgets. While INT8 quantization effectively accelerates the primary matrix multiplications, it exposes the softmax-related path as the dominant bottleneck. This stage incurs a costly dequantize -> softmax -> requantize detour, which can account for up to 65% of total attention latency and disrupts the end-to-end integer dataflow critical for edge hardware efficiency.

arXiv CS 9d ago

Design Space Exploration of DMA based Finer-Grain Compute Communication Overlap

arXiv:2512.10236v2 Announce Type: replace Abstract: Modern ML workloads demand distributing training and inference across multiple GPUs. However, these parallelization techniques often suffer from exposed critical-path communication, leaving a potential 1.7x speedup on the table through compute-communication overlap. Prior overlapping methods harness the fact that ML model state and inputs are already sharded into the number of GPUs, and overlap the compute and communication at shard...

arXiv CS 8d ago

Design Space Exploration of DMA based Finer-Grain Compute Communication Overlap

arXiv:2512.10236v3 Announce Type: replace Abstract: Modern ML workloads demand distributing training and inference across multiple GPUs. However, these parallelization techniques often suffer from exposed critical-path communication, leaving a potential 1.7x speedup on the table through compute-communication overlap. Prior overlapping methods harness the fact that ML model state and inputs are already sharded into the number of GPUs, and overlap the compute and communication at shard...

arXiv CS 6d ago

HE^2: A Communication-Light Heterogeneous Architecture for Efficient Fully Homomorphic Encryption

arXiv:2605.31004v1 Announce Type: new Abstract: CKKS, an emerging fully homomorphic encryption (FHE) scheme, has been promising in privacy-preserving applications by enabling SIMD fixed-point computations on ciphertexts. Despite its strong security guarantees, CKKS involves both compute-intensive operators (ComOps) with high computational cost and memory-intensive operators (MemOps) with large memory footprints, making existing ASIC-based or NMP-based acceleration approaches suffer from high...

arXiv CS 9d ago

MOSAIC: A Workload-Driven Simulation and Design-Space Exploration Framework for Heterogeneous NPUs

arXiv:2606.05362v2 Announce Type: replace Abstract: AI model architectures are diversifying rapidly. Although dense matrix multiplication underlies today's CNNs and transformers, emerging architectures (state-space models, long convolutions via the fast Fourier transform (FFT), Kolmogorov-Arnold networks, and spiking networks) are not multiply-accumulate (MAC) dominated; they spend much of their computation on vector and non-MAC primitives that homogeneous, MAC-centric neural processing...

arXiv CS 1d ago