Home › Knowledge Base › Adaptive Batch Scaling

Adaptive Batch Scaling

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Scalable Reinforcement Learning via Adaptive Batch Scaling

arXiv:2605.21557v2 Announce Type: replace-cross Abstract: Conventional wisdom holds that large-batch training is fundamentally incompatible with Reinforcement Learning (RL) - beyond a modest threshold, increasing batch sizes typically yields diminishing returns or performance degradation due to the inherent non-stationarity of the data distribution. We challenge this view by observing that non-stationarity is not a fixed property of RL, but evolves throughout training: early stages exhibit...

arXiv CS 5d ago

Efficient Scaling of LLM Training with Flexible Context Parallelism

arXiv:2602.21788v2 Announce Type: replace Abstract: Scaling long-context capabilities is crucial for Large Language Models (LLMs). However, real-world data contain a large number of sequences with heterogeneous lengths. Existing training libraries for LLMs rely on static parallelism strategies, which suffer from severe load imbalance, redundant communication, and suboptimal hardware utilization under data heterogeneity.

arXiv CS 1d ago

When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection

arXiv:2603.09242v2 Announce Type: replace Abstract: The growing realism of generative models has blurred the boundary between real and synthetic content, posing significant challenges to reliable AI-generated image detection. Although large-scale pre-trained Vision Foundation Models have advanced detection capability, their generalization to images from unseen generation pipelines remains inadequate. In this paper, we identify, for the first time, a key failure mechanism, termed...

arXiv CS 6d ago

SALT: When More Rollouts Don't Help in Group-Based Policy Optimization and How to Make Them Matter

Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) often adopts GRPO-style group-relative updates, sampling multiple rollouts per prompt to construct normalized learning signals. However, merely increasing the number of rollouts does not reliably strengthen learning: under GRPO-style group normalization, per-rollout policy-gradient features can concentrate into a low-rank, signed geometry, causing substantial cancellation during aggregation and weakening the...

arXiv CS 5d ago

Sequential Subspace Mode Adaptation for the Reduced-Order Homogenization of Dissipative Microstructures using E3C Hyper-Reduction

arXiv:2606.02089v1 Announce Type: new Abstract: Three-dimensional inelastic computational homogenization of complex engineering components requires a multitude of nonlinear microstructural simulations, making it computationally expensive. This work investigates a projection-based model order reduction (pMOR) method with 'Sequential Subspace Mode Adaptation', which can be easily integrated into existing codes using linear subspaces. Starting with a 'conventional' linear subspace strain...

arXiv Physics 8d ago

The American Missile Crisis

Recent global conflicts, from Russia and Ukraine to Iran and Israel, have seen a resurgent awareness of the frailty of US munitions stock, which has been drawn down by both direct and indirect involvement in these events. While exact stockpile volumes are not disclosed, it is estimated that supplies of US warheads and the missiles that carry them have declined by nearly an order of magnitude since their peak during the Cuban Missile Crisis. Analysts have estimated that in the event of a...

Hacker News 7d ago

Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation

arXiv:2605.21125v2 Announce Type: replace Abstract: Group Relative Policy Optimization (GRPO), a prominent algorithm within the Reinforcement Learning from Verifiable Rewards (RLVR) framework, has achieved strong results in improving the reasoning capabilities of large language models (LLMs). However, GRPO is prone to advantage collapse, a failure mode where homogeneous rewards within a group (e.g., all correct or all incorrect answers) yield near-zero advantages and vanishing gradients. To...

arXiv CS 8d ago

Deep learning four decades of human migration

Abstract Human migration is a fundamental driver of global demographic change, shaping population structure, labour markets and social policy across countries1,2,3. Although long-term migration patterns are often linked to economic development4, they can shift rapidly in response to shocks such as conflict, environmental crises and political change5. Despite its importance, migration remains difficult to measure consistently: existing data are sparse, concentrated in high-income settings and...

Nature 22h ago

GeoLibre 1.0

Cloud-native GIS platform A lightweight, cloud-native GIS platform for visualizing, exploring, and analyzing geospatial data. GeoLibre is built with Tauri, React, TypeScript, MapLibre GL JS, DuckDB-WASM Spatial, and deck.gl.

Hacker News 4h ago

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

Announce Type: new Abstract: The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episode on bandwidth-limited edge hardware, where high-bandwidth memory and flash are scarce, flash has finite write endurance, and memory writes rather than compute can become the binding constraint.

arXiv CS 7d ago