Home › Knowledge Base › Multi-Stage Training

Multi-Stage Training

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

$M^3$ Scaling Law: Optimizing Multi-Epoch, Multi-Lingual, and Multi-Stage Training for Low-Resource Language Models

arXiv:2410.12325v2 Announce Type: replace Abstract: In this paper, we study a fundamental design problem in pretraining Large Language Models (LLMs) for low-resource language regimes. Existing works adopt multi-epoch, multi-lingual, and multi-stage training to utilize the limited target-language corpus efficiently, but no prior scaling law can compare recipes spanning these approaches under the same compute budget $C$ and target-language corpus size $D_T$, leaving the optimal training setup...

arXiv CS 8d ago

HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression

arXiv:2606.01934v1 Announce Type: new Abstract: Large language models achieve remarkable performance via extended chain-of-thought (CoT) reasoning, yet this lengthy process incurs substantial inference overhead. Existing CoT compression methods struggle with inflexible manual length budgets, computationally expensive multi-stage training pipelines, and fragile scalability restricted to small models. We propose HMPO (Hybrid Median-length Policy Optimization), a cost-effective, single-stage...

arXiv CS 8d ago

Physics-Informed Modeling and Control of Emergent Behaviors in Robot Swarms

arXiv:2606.01597v1 Announce Type: new Abstract: Robot swarms can exhibit coherent collective behaviors through local perception, limited communication and decentralized decision-making, yet modeling and controlling such emergence remains challenging when behaviors unfold over multiple phases. Here we introduce PhySwarm, a physics-informed micro--macro framework that represents multi-stage swarm emergence as physically constrained density-field evolution coupled to executable robot motion. At...

arXiv CS 8d ago

SaliMory: Orchestrating Cognitive Memory for Conversational Agents

arXiv:2606.04120v1 Announce Type: new Abstract: Conversational agents that serve as lifelong companions must maintain persistent memory across all interactions. However, simply expanding context windows with raw retrieval degrades reasoning quality, while training memory agents via standard reinforcement learning creates a severe credit assignment bottleneck in a multi-stage pipeline. To solve this, we introduce SALIMORY, a framework that trains a single language model to manage a...

arXiv CS 6d ago

URDF-Anything+: End-to-End Generation for Simulation-Ready Articulated Assets

arXiv:2603.14010v2 Announce Type: replace Abstract: Articulated objects are fundamental for robotics, simulation of physics, and interactive virtual environments. However, recovering them from visual observations is inherently challenging, as images provide only partial and ambiguous cues about both part geometry and their underlying kinematic structure. Existing approaches typically rely on multi-stage pipelines, retrieval from asset libraries, or explicit part segmentation.

arXiv CS 8d ago

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

arXiv:2606.03841v1 Announce Type: new Abstract: Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science. However, existing approaches remain fundamentally limited by their static action sets and lack of principled long-horizon context management, hindering their ability to accumulate reusable experience across tasks and operate reliably in multi-stage, iterative data science pipelines.

arXiv CS 7d ago

You Only Train Once: Differentiable Subset Selection for Omics Data

arXiv:2512.17678v2 Announce Type: replace Abstract: Selecting compact and informative gene subsets from single-cell transcriptomic data is essential for biomarker discovery, improving interpretability, and cost-effective profiling. However, most existing feature selection approaches either operate as multi-stage pipelines or rely on post hoc feature attribution, making selection and prediction weakly coupled. In this work, we present YOTO (you only train once), an end-to-end framework that...

arXiv CS 6d ago

RescueBench: Can Embodied Agents Save Lives in the Wild ?

new Abstract: Search-and-rescue (SAR) requires embodied agents to explore unfamiliar environments under multimodal uncertainty, perform multi-stage interactions, and retrieve spatial memory over long horizons. Existing benchmarks typically evaluate these capabilities in isolation, leaving unclear how failures compound when they must be composed in realistic workflows. We introduce RescueBench, a photo-realistic diagnostic benchmark that instantiates SAR as a four-stage pipeline: multimodal...

arXiv CS 8d ago

Segment-level Tree Search for Long Meeting Document Summarization

Announce Type: new Abstract: Meeting documents are challenging to summarize due to their length and complex conversational structure. Existing approaches typically adopt multi-stage pipelines that extract information prior to summarization; however, these approaches often suffer from cumulative error propagation without intermediate validation, a limitation further amplified by short and low-quality reference summaries. We propose segment-level summarization via Monte Carlo Tree Search (S3),...

arXiv CS 1d ago

LocalSUG: City-Preference-Enhanced LLM for Query Suggestion in Local-Life Services

arXiv:2603.04946v2 Announce Type: replace Abstract: In local-life service platforms, query suggestion reduces user effort by generating candidate queries from input prefixes. Traditional multi-stage systems rely heavily on historical popular queries, limiting their ability to capture long-tail and emerging demand. Although LLMs provide strong semantic generalization, their deployment in local-life services faces three challenges: insufficient city-preference awareness, exposure bias in...

arXiv CS 9d ago