Home Knowledge Base \emph{entropy

\emph{entropy

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Stage-1 Controls the Entropy Regime, Not the Outcome

arXiv:2606.09059v1 Announce Type: new Abstract: Two-stage post-training -- a Stage-1 warm-start (supervised fine-tuning, SFT, or on-policy distillation, OPD) followed by Stage-2 reinforcement learning (RL) -- is increasingly used for vision-language models (VLMs). We ask what Stage-1 actually controls in a small-data study using Qwen2.5-VL-7B with a same-modality 72B VLM teacher for OPD. First, the three warm-starts reach a narrow $53$--$54\%$ band on Geometry3K internal validation,...

arXiv CS 1d ago

Spectral Anatomy of Quantum Gaussian Process Kernels

Announce Type: new Abstract: Two recent results have reshaped quantum Gaussian processes (QGPs). On the one hand, \citet{lowe2025assessing} rule out the exponential speedups claimed by HHL-based QGP regression in the typical, well-conditioned regime; on the other, an independent line of work shows that highly expressive quantum kernels suffer posterior pathologies that break Bayesian optimization.

arXiv CS 9d ago

Spectral Anatomy of Quantum Gaussian Process Kernels

Announce Type: replace Abstract: Two recent results have reshaped quantum Gaussian processes (QGPs). On the one hand, \citet{lowe2025assessing} rule out the exponential speedups claimed by HHL-based QGP regression in the typical, well-conditioned regime; on the other, an independent line of work shows that highly expressive quantum kernels suffer posterior pathologies that break Bayesian optimization. We show that these seemingly unrelated phenomena are governed by the same quantity: the...

arXiv CS 7d ago

Reinforcement Learning for Flow-Matching Policies with Density Transport

Announce Type: new Abstract: We present an online reinforcement learning (RL) algorithm for fine-tuning flow-matching policies in continuous-control problems. Our key insight is to view RL-based policy improvement as a transport of action densities towards regions of high reward, which naturally aligns with the transport formulation of flow matching models. Prior methods either approximate the current or optimal policy distribution or resort to distillation, which introduces biased gradients...

arXiv CS 1d ago

Before and After Temperature: A Distributional View of Creative LLM Generation

arXiv:2606.01451v1 Announce Type: new Abstract: Reference-free evaluation of large language model (LLM) creativity relies on perplexity, entropy, and top-1 margin. We show that a much stronger signal lives one step earlier in the pipeline: in how sampling temperature \emph{reshapes} the model's token distribution before the next token is drawn. On Llama-3.1-8B-Instruct generations of 500 open-ended creative prompts at $T \in \{0.3, 0.8, 1.5\}$, a single per-token feature derived from this...

arXiv CS 8d ago

Finite-Temperature de Bruijn Identities: Fisher Information as the Spectral Gap of Blahut--Arimoto Dynamics

arXiv:2606.03813v1 Announce Type: new Abstract: We uncover a finite-temperature extension of de Bruijn's identity -- the classical relation $\frac{d}{dt}h(X+\sqrt{t}Z)=\frac{1}{2}J(X)$ connecting differential entropy and Fisher information. Our framework is the spectral theory of Blahut--Arimoto (BA) dynamics, recently developed by Wang~\cite{Wang2026} for the analysis of rate-distortion optimization. The central observation is elementary yet profound: for Gaussian sources, the spectral gap...

arXiv CS 7d ago