Home Knowledge Base Next-Token Prediction

Next-Token Prediction

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Next-Token Prediction Learns Generalisable Representations of Sleep Physiology

arXiv:2606.09605v1 Announce Type: new Abstract: Foundation models offer a promising route to compress multi-modal physiological signals into compact representations of human health, with broad applications across sleep medicine, cardiology, neurology and other healthcare domains. Existing models have typically been trained with masked-reconstruction or contrastive objectives. However, masked reconstruction may be poorly suited to the stochastic nature of these signals, while contrastive...

arXiv CS 1d ago

When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming

arXiv:2605.23278v2 Announce Type: replace Abstract: Language models trained on observed sequences are often described as learning the conditional distribution of the next token given previous tokens. This description is only conditionally correct. A model trained on realized token trajectories does not observe full conditional laws; it receives sampled continuations.

arXiv CS 9d ago

Token Predictors Are Not Planners: Building Physically Grounded Causal Reasoners

Announce Type: new Abstract: Current benchmarks for embodied vision-language planning often favor linguistic next-token prediction over physically grounded next-state reasoning. This rewards models that mimic statistical language priors rather than track causal dependencies, reducing physical planning to shallow sequence modeling. We argue that reliable physical autonomy requires a shift from linguistically grounded token prediction toward physically grounded causal reasoning.

arXiv CS 8d ago

Learning Concepts, Not Tokens: Self-Supervised Semantic Alignment for Language Models

arXiv:2603.29123v3 Announce Type: replace Abstract: The next-token prediction (NTP) objective trains language models to predict a single token at each step, even though many continuations can express the same meaning. For example, in the sentence ``this sticker can be placed here'', positioned, attached, or put are all plausible alternatives. While standard NTP training treats these alternatives as mutually exclusive targets, we explore a self-supervised framework that encourages models to...

arXiv CS 7d ago

TBD-VLA: Temporal Block Diffusion Vision Language Action Model

arXiv:2606.07895v1 Announce Type: new Abstract: Discrete Vision-Language-Action (VLA) models typically formulate action generation as next-token prediction over discretized action spaces, conditioning each token autoregressively on prior context. While effective, this paradigm incurs high inference latency and largely ignores the temporal structure inherent in action trajectories. Recent efforts introduce parallel decoding to improve efficiency, enabling faster inference, but lack explicit...

arXiv CS 1d ago

One Loss to Rule Them All: Marked Time-to-Event for Structured EHR Foundation Models

Announce Type: replace Abstract: Clinical events captured in Electronic Health Records (EHR) are irregularly sampled and may consist of a mixture of discrete events and numerical measurements, such as laboratory values or treatment dosages. The sequential nature of EHR, analogous to natural language, has motivated the use of next-token prediction to train prior EHR Foundation Models (FMs) over events. However, this training fails to capture the full structure of EHR.

arXiv CS 2d ago

Inferring the Size of Large Language Models From Popular Text Memorization

arXiv:2605.29223v2 Announce Type: replace Abstract: The parameter counts of the most widely used large language models (LLMs) are often withheld by their developers, leaving model size -- a primary reference point for interpreting capabilities and costs -- largely undisclosed. We propose a black-box method to infer conservative lower bounds on LLM size from generated text outputs alone, requiring nothing beyond the ability to submit text fragments and observe next-token predictions. Our...

arXiv CS 9d ago

Inferring the Size of Large Language Models From Popular Text Memorization

arXiv:2605.29223v3 Announce Type: replace Abstract: The parameter counts of the most widely used large language models (LLMs) are often withheld by their developers, leaving model size -- a primary reference point for interpreting capabilities and costs -- largely undisclosed. We propose a black-box method to infer conservative lower bounds on LLM size from generated text outputs alone, requiring nothing beyond the ability to submit text fragments and observe next-token predictions. Our...

arXiv CS 2d ago

When Autoregressive Consistency Hurts Safety Alignment

Announce Type: new Abstract: Safety alignment in large language models (LLMs) is fragile in part because it is often shallow: fine-tuning mainly reshapes the model's behavior near the first few output tokens. We argue that this phenomenon can be understood through autoregressive consistency, the tendency of next-token prediction to preserve and extend the current response trajectory consistently. By analyzing the learning dynamics of safety alignment, we show that autoregressive consistency...

arXiv CS 6d ago