Home Knowledge Base Adaptive Token Allocation

Adaptive Token Allocation

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

SuperVoxelGPT: Adaptive and Ordered 3D Tokenization for Autoregressive Shape Generation

arXiv:2605.29655v2 Announce Type: replace Abstract: Autoregressive multimodal large language models (MLLMs) enable 3D generation but struggle to scale to high-resolution shapes due to inadequate 3D tokenizations. Compact set-based representations discard deterministic spatial ordering, leading to ambiguous sequence prediction, while uniform or octree-based voxel grids preserve ordering at the cost of severe redundancy and excessively long sequences. This structural trade-off limits stable...

arXiv CS 9d ago

Adaptive Tokenisation Via Temporal Redundancy Masking And Latent Inpainting

arXiv:2606.06158v1 Announce Type: new Abstract: Adaptive video tokenisation seeks to dynamically allocate token budgets based on the underlying visual complexity of a sequence. Current continuous-regime approaches achieve this via iterative binarised searches or trained neural regressors, while discrete methods often require a full-rate decoder pass to estimate information content. We demonstrate that such computational overheads are not strictly necessary.

arXiv CS 5d ago

AdaTok: Self-Budgeting Image Tokenization with Quality-Preserving Dynamic Tokens

Announce Type: new Abstract: Image tokenizers, from 2D grids to recent 1D sequences, typically encode every image with the same fixed number of tokens. Yet visual complexity is highly heterogeneous, so a uniform budget overspends on simple inputs and underserves complex ones. Existing elastic tokenizers expose variable-length reconstructions, but often leave token length as a deployment-time operating point, a search target, or an external prediction rather than an output of the tokenizer...

arXiv CS 2d ago

The Easy, the Hard, and the Learnable: Confidence and Difficulty-Adaptive Policy Optimization for LLM Reasoning

arXiv:2606.07950v1 Announce Type: new Abstract: RL with verifiable rewards can substantially improve LLM reasoning, yet standard GRPO-style training often treats easy, hard, and learnable questions alike through uniform sampling and weighting, leading to inefficient compute allocation. We study GRPO by tracking token log-probabilities, group-normalized advantages, and the induced token-level update weights. This reveals three recurring dynamics as training proceeds: (1) confidence inflation,...

arXiv CS 1d ago

MeshTok: Efficient Multi-Scale Tokenization for Scalable PDE Transformers

arXiv:2606.04366v1 Announce Type: new Abstract: Conventional patchified Transformers operate on uniform spatial partitions, distributing computational effort evenly across the domain irrespective of local features. This inflexible tokenization scheme is inherently limited in its ability to efficiently represent and process solutions to complex PDEs.

arXiv CS 6d ago

ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts

arXiv:2606.01509v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models scale by activating only a small subset of experts per token. However, training such models remains challenging because top-$k$ routing is discrete and non-differentiable, requiring gradient estimators for expert selection whose design remains a central open problem.

arXiv CS 8d ago

SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving

Announce Type: new Abstract: Prefill-decode (PD) disaggregation decouples prompt processing from token generation, but it also turns the key-value (KV) cache into a network payload. Existing PD-side KV reduction methods are mostly binary: selected tokens are transmitted at full precision and the rest are not transmitted. This paper argues that binary selection leaves a useful design space unused.

arXiv CS 1d ago

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

arXiv:2605.05225v3 Announce Type: replace Abstract: Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-based load balancing methods fail to address two unique challenges: (1) Information Heterogeneity, where numerous redundant visual tokens are treated equally to semantically critical ones,...

arXiv CS 2d ago

ElasticMem: Latent Memory as a Learnable Resource for LLM Agents

arXiv:2605.30690v1 Announce Type: new Abstract: Long-term memory is essential for LLM agents to reason coherently across extended interactions, personalize responses, and reuse past experience. However, existing memory-augmented methods typically treat memory as a fixed resource: text-space approaches concatenate retrieved memories into the context window, causing substantial token overhead and sensitivity to noisy evidence, while latent-space approaches reduce textual cost but still rely on...

arXiv CS 9d ago

EASE-TTT: Evidence-Aligned Selective Test-Time Training for Long-Context Question Answering

arXiv:2606.06906v1 Announce Type: new Abstract: Long-context question answering (QA) remains challenging for smaller language models even when answer-bearing evidence is already present in the input. Existing within-context retrieval methods localize and expose candidate evidence chunks for the question, but they stop at input-level evidence exposure rather than adapting the query-side attention parameters that control how the model allocates attention over full-context positions. In...

arXiv CS 2d ago