Adaptive Layer
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem
arXiv:2602.20217v2 Announce Type: replace Abstract: Self-speculative decoding (SSD) accelerates LLM inference by skipping layers to create an efficient draft model, yet existing methods often rely on static heuristics that ignore the dynamic computational overhead of attention in long-context scenarios. We propose KnapSpec, a training-free framework that reformulates draft model selection as a knapsack problem to maximize tokens-per-time throughput. By decoupling Attention and MLP layers and...
LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models
arXiv:2606.01838v1 Announce Type: new Abstract: Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, complex, high perplexity). Despite this heterogeneity, current inference systems apply identical compute to every step. We introduce LayerRoute, a lightweight adapter that learns to selectively skip transformer blocks on a per-input basis.
FoRA: Fisher-orthogonal Rank Adaptation for Parameter-Efficient Fine-Tuning
arXiv:2605.29317v2 Announce Type: replace Abstract: Parameter-efficient fine-tuning(PEFT) has largely focused on LoRA and its accuracy-oriented variants, leaving the original goal of reducing trainable parameters has receivedcomparatively little attention. We introduce FoRA, which revisits this goal by reducing the number of adapted layers rather than adapter rank. FoRA selects task-informative layers via a single-pass diagonal Fisher score (under 1% of training cost) and trains the LoRA...
TIMEGATE: Sustainable Time-Boxed Promotion Gates for Continual ML Adaptation Under Resource Constraints
arXiv:2605.29183v2 Announce Type: replace Abstract: As machine learning(ML) systems evolve to continual adaptation, each re-training cycle uses compute, annotation, and energy. We introduce TIMEGATE, a policy layer managing adaptation by budgeting time, labeling, training, and evaluation. TIMEGATE emits a metric-availability signal M for partial vs. full-evaluation decisions.
SaluNet: Enabling Total Plasticity in Normalization-Free Deep Networks
Announce Type: new Abstract: Normalization layers such as BatchNorm and LayerNorm have long been considered essential for stable training in deep networks. This work demonstrates that they can be fully replaced by a single learnable activation mechanism. We identify a plasticity suppression effect induced by standard normalization: learnable activation parameters rapidly lose adaptability when paired with normalization layers.
Dominant-Layer ZO: A Single Layer Dominates Zeroth-Order Fine-Tuning of LLMs
arXiv:2606.05516v1 Announce Type: new Abstract: Zeroth-order (ZO) optimization enables memory-efficient fine-tuning of large language models (LLMs) using only forward passes, but it remains unclear how useful adaptation is distributed across layers. In this work, we reveal a surprising phenomenon: ZO fine-tuning is sharply dominated by a single decoding layer. Across multiple LLM families and downstream tasks, fine-tuning this dominant layer alone consistently matches or even exceeds...
RAPID: Layer-Wise Redundancy-Aware Pruning and Importance-Driven Token Merging for Efficient ViT
arXiv:2606.08156v1 Announce Type: new Abstract: Vision Transformers (ViTs) achieve strong performance but suffer from high computational costs due to quadratic self-attention complexity. Although token reduction techniques such as pruning and merging mitigate this, they typically overlook how representations evolve across network depth. We propose RAPID, a depth-aware token reduction framework that adapts reduction strategies to the layer-wise characteristics of token representations.
TaDA: Calibrated Probe Gating for Task-Domain LoRA Merging
arXiv:2606.05016v1 Announce Type: new Abstract: Combining a task LoRA adapter with a domain LoRA adapter into a single unified model is a practical yet largely unexplored challenge. Existing methods treat both adapters as symmetric peers, applying uniform weights across all layers.
FiLM-Based Speaker Conditioning of a SpeechLLM for Pathological Speech Recognition
arXiv:2606.06211v1 Announce Type: new Abstract: Automatic speech recognition (ASR) has advanced remarkably for standard speech; however, pathological speech from neurological conditions remains a significant challenge. We investigate speaker conditioning via Feature-wise Linear Modulation (FiLM), injecting x-vector-derived information into each transformer layer of a frozen ASR encoder to adapt internal representations to individual pathological speakers without modifying base model weights....
DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training
arXiv:2512.13996v2 Announce Type: replace Abstract: Sparse Mixture-of-Experts architectures are essential for scaling model capacity efficiently, yet the standard Top-$k$ routing imposes a rigid sparsity pattern that ignores the intrinsic variance in token difficulty and layer-specific computational needs. Top-$p$ routing is more adaptive because it selects experts until their cumulative routing probability reaches a threshold, allowing confident tokens to use fewer experts and ambiguous...