Sparsity
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Activation Concentration: Characterizing Column-Level Output Sparsity Across Diffusion Model Architectures
Announce Type: replace Abstract: Recent diffusion accelerators exploit activation sparsity by skipping near-zero GELU outputs, reporting 52--85% element-level sparsity. However, systolic-array hardware processes activations at column granularity, where a single non-zero element forces the entire column to be computed. We present the first systematic column-level sparsity characterization across seven diffusion workloads spanning three workload groups and four modalities.
Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads
arXiv:2606.05843v1 Announce Type: new Abstract: While Multimodal Large Language Models (MLLMs) demonstrate remarkable proficiency on complex vision-language tasks, the mechanisms by which they extract query-relevant visual features from complex, noisy contexts remain opaque. In this paper, we present an in-depth interpretability study that uncovers a profound structural property within MLLMs: functional sparsity in cross-modal retrieval. Leveraging a token-level metric termed Retrieval...
DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training
arXiv:2512.13996v2 Announce Type: replace Abstract: Sparse Mixture-of-Experts architectures are essential for scaling model capacity efficiently, yet the standard Top-$k$ routing imposes a rigid sparsity pattern that ignores the intrinsic variance in token difficulty and layer-specific computational needs. Top-$p$ routing is more adaptive because it selects experts until their cumulative routing probability reaches a threshold, allowing confident tokens to use fewer experts and ambiguous...
DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training
arXiv:2512.13996v3 Announce Type: replace Abstract: Sparse Mixture-of-Experts architectures are essential for scaling model capacity efficiently, yet the standard Top-$k$ routing imposes a rigid sparsity pattern that ignores the intrinsic variance in token difficulty and layer-specific computational needs. Top-$p$ routing is more adaptive because it selects experts until their cumulative routing probability reaches a threshold, allowing confident tokens to use fewer experts and ambiguous...
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
arXiv:2505.24037v3 Announce Type: replace Abstract: Sparse large language models (LLMs) offer an attractive direction toward efficient deployment, but adapting them to downstream tasks remains challenging. The central difficulty is to enable effective task adaptation without sacrificing the efficiency advantages of sparsity. Existing fine-tuning methods are not well-suited to this setting, as they either introduce additional dense parameters or assume a fixed sparse topology, limiting their...
The Effect of Mobility Trajectory Sparsity on Epidemic Modeling Outcomes
Announce Type: new Abstract: GPS mobility data are increasingly used in epidemic modeling, allowing the construction of co-location networks or population flows. These trajectories typically exhibit high temporal sparsity because data collection is opportunistic and tied to phone use. Despite growing awareness of this limitation, the analysis and treatment of biases derived from it have been largely overlooked in existing epidemic modeling studies, raising concerns about the robustness of...
Calibration Data Trade-offs Across Capability Dimensions: Why Multi-Source Mixing Matters for High-Sparsity LLM Pruning
arXiv:2606.03328v1 Announce Type: new Abstract: Post-training pruning compresses large language models to high sparsity using a small unlabelled calibration set, and recent work has concluded that the choice of calibration source has only modest impact on averaged post-pruning accuracy. We ask whether this conclusion survives once calibration impact is evaluated separately across distinct capability dimensions rather than aggregated. Decomposing post-pruning capability into General,...
Calibration Data Trade-offs Across Capability Dimensions: Why Multi-Source Mixing Matters for High-Sparsity LLM Pruning
arXiv:2606.03328v2 Announce Type: replace Abstract: Post-training pruning compresses large language models to high sparsity using a small unlabelled calibration set, and recent work has concluded that the choice of calibration source has only modest impact on averaged post-pruning accuracy. We ask whether this conclusion survives once calibration impact is evaluated separately across distinct capability dimensions rather than aggregated. Decomposing post-pruning capability into General,...
RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models
Announce Type: replace Abstract: Diffusion Transformers (DiT) achieve strong performance in image generation but incur substantial inference costs. While prior work has reduced this cost via quantization and distillation, semi-structured sparsity, which can nearly halve FLOPs, remains underexplored. A key reason is that most existing approaches focus on weight sparsification, and pruning 50% of the weights can remove critical model capacity and degrade generation quality.
SABLE: GPU-Based Power Flow Accelerator for Sparsity-Aware Batched Learning
arXiv:2606.07099v1 Announce Type: new Abstract: Recent studies have developed GPU-based approaches for solving AC power flow and successfully applied them to standalone power flow problems. However, integrating these approaches into modern differentiable learning frameworks while preserving sparsity remains challenging. To this end, we present SABLE, a GPU-based sparse batched power flow accelerator for differentiable learning via an implicit power flow layer.