Adaptive MoE
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
CRAM: Centroid-Routing and Adaptive MoE for Multimodal Continual Instruction Tuning
arXiv:2606.02502v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) unify heterogeneous vision-language tasks under a shared generative framework via instruction tuning, yet real-world deployment demands continuous capability expansion, making Multimodal Continual Instruction Tuning (MCIT) essential. Existing methods either update all tasks with a shared parameter set or allocate dedicated modules for each new task.
Post-Trained MoE Can Skip Half Experts via Self-Distillation
arXiv:2605.18643v2 Announce Type: replace Abstract: Mixture-of-Experts (MoE) scales language models efficiently through sparse expert activation, and its dynamic variant further reduces computation by adjusting the activated experts in an input-dependent manner. Existing dynamic MoE methods usually rely on pre-training from scratch or task-specific adaptation, leaving the practical conversion of fully trained MoE underexplored. Enabling such adaptation would directly alleviate the inference...
MosaicIMU: Composing Carrier Experts for Generalizable Neural Inertial Odometry
Announce Type: new Abstract: Robust inertial odometry is essential for various carriers when external sensing is unreliable. Learning-based methods reduce integration drift by capturing local motion priors, but these methods often remain tied to a particular carrier, limiting generalization across heterogeneous platforms. We present MosaicIMU, a carrier-conditioned Mixture-of-Experts (MoE) pretraining-and-adaptation framework for generalizable neural inertial odometry.
CoMoL: Efficient Mixture of LoRA Experts via Dynamic Core Space Merging
arXiv:2603.00573v2 Announce Type: replace Abstract: Large language models (LLMs) achieve remarkable performance on diverse downstream and domain-specific tasks via parameter-efficient fine-tuning (PEFT). However, existing PEFT methods, particularly MoE-LoRA architectures, suffer from limited parameter efficiency and coarse-grained adaptation due to the proliferation of LoRA experts and instance-level routing. To address these issues, we propose Core Space Mixture of LoRA (\textbf{CoMoL}), a...
CoRe-MoE: Contrastive Reweighted Mixture of Experts for Multi-Terrain Humanoid Locomotion with Gait Adaptation
arXiv:2606.04718v1 Announce Type: new Abstract: Humans primarily rely on walking and running to traverse complex terrains, without resorting to unnecessarily complex motion patterns. Similarly, humanoid robots should achieve smooth transitions between walking and running while maintaining natural and stable locomotion. However, unifying gait transition and multi-terrain adaptation within a single policy remains challenging due to gradient interference and the distribution shift induced by...
SPAMoE: Spectrum-Aware Hybrid Operator Framework for Full-Waveform Inversion
arXiv:2604.07421v3 Announce Type: replace Abstract: Full-waveform inversion (FWI) is pivotal for reconstructing high-resolution subsurface velocity models but remains computationally intensive and ill-posed. While deep learning approaches promise efficiency, existing Convolutional Neural Networks (CNNs) and single-paradigm Neural Operators (NOs) struggle with one fundamental issue: frequency entanglement of multi-scale geological features. To address this challenge, we propose...
Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling
Announce Type: new Abstract: Preference modeling plays a central role in reinforcement learning from human feedback (RLHF), enabling large language models (LLMs) to align with human values. However, most existing approaches assume a universal reward function, neglecting the diversity and heterogeneity of human preferences. To address this limitation without additional annotation costs, recent work has proposed learning multiple preference components from binary data and combining them to...
Graph-Conditioned Mixture of Graph Neural Network Experts for Traffic Forecasting
arXiv:2605.30486v1 Announce Type: new Abstract: Spatio-temporal forecasting on sensor graphs is commonly tackled with a single backbone architecture applied uniformly across all nodes, although graph regions can exhibit different dynamics. Road segments differ in functional class, structure, and traffic behavior, suggesting that node-wise expert specialization can be useful. We propose GC-MoE, a graph-conditioned mixture of experts framework that assigns each node a personalized combination...
Sakana AI's Recursive Self-Improvement (RSI) Lab
The Next Paradigm of Artificial Intelligence As the world enters the era of artificial intelligence, Japan has a unique opportunity to reclaim its position at the frontier of global innovation. However, to achieve global leadership in AI and scientific discovery, we cannot simply stick to the conventional approach of brute-forcing monolithic models. We must leapfrog the current paradigm.
Personalization Meets Safety:Mechanisms,Risks,and Mitigations in Personalized LLMs
arXiv:2606.09038v1 Announce Type: new Abstract: Large Language Models (LLMs) have enabled increasingly personalized interactions by adapting to users' preferences, contexts, and long-term histories. However, the mechanisms that enable personalization also expand the safety landscape in ways not systematically addressed by existing literature. Existing reviews typically focus either on personalization or safety, leaving their intersection largely unexplored.