\sqrt{d}$
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
The Unreasonable Redundancy of Nature's Protein Folds
The Unreasonable Redundancy of Nature's Protein Folds Over the last few years, deep neural networks have made generative language modeling dramatically more powerful, giving us large language models. A similar leap happened for continuous modalities like images and videos.
An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction
arXiv:2508.11931v3 Announce Type: replace Abstract: We present an oracle-efficient, near-optimal algorithm for linear contextual bandits with adversarial losses and stochastic action sets, only requiring a linear optimization oracle for the action sets in each round. Our approach reduces this setting to misspecification-robust adversarial linear bandits with fixed action sets. Without knowledge of the context distribution or access to a context simulator, the algorithm achieves...
When and why randomised exploration works (in linear bandits)
arXiv:2502.08870v2 Announce Type: replace Abstract: We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$-dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an $n$-step regret bound of the order $O(d\sqrt{n} \log(n))$. Notably, this shows for the first time that there...
Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning
arXiv:2603.03480v2 Announce Type: replace Abstract: We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces,...
Rectangular Matrix Multiplication in the Low-Bandwidth Model
arXiv:2606.04652v1 Announce Type: new Abstract: We study rectangular matrix multiplication in the low-bandwidth model of distributed computing. There are $n$ computers; initially the input matrices are distributed evenly between computers, and in each communication round every computer can send and receive an $O(\log n)$-bit message. Eventually each computer must output its designated part of the product matrix.
Finite-Temperature de Bruijn Identities: Fisher Information as the Spectral Gap of Blahut--Arimoto Dynamics
arXiv:2606.03813v1 Announce Type: new Abstract: We uncover a finite-temperature extension of de Bruijn's identity -- the classical relation $\frac{d}{dt}h(X+\sqrt{t}Z)=\frac{1}{2}J(X)$ connecting differential entropy and Fisher information. Our framework is the spectral theory of Blahut--Arimoto (BA) dynamics, recently developed by Wang~\cite{Wang2026} for the analysis of rate-distortion optimization. The central observation is elementary yet profound: for Gaussian sources, the spectral gap...
Batched Stochastic Linear Bandits with 1-Bit Communication Constraints
Announce Type: cross Abstract: We study stochastic linear bandits under a natural combination of batching and communication constraints: the time horizon is partitioned into batches of equal size $B$, and during each batch the learner sends $B$ requested arm pulls to an agent, who then observes the corresponding $B$ rewards and responds with a single bit of feedback to the learner. For each batch, the learner specifies the 1-bit quantization rule the agent uses, which may depend on all...
Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions
arXiv:2605.01752v4 Announce Type: replace Abstract: We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and a cumulative corruption budget $\mathcal{C}$. To address these challenges, we propose e RCDP-UCB, which integrates a learned approximator that predicts post-serving contexts from pre-serving information....
Coherent Swap Regret and Channel-Proof Learning
arXiv:2606.02655v1 Announce Type: cross Abstract: External regret certifies stability only against replacing one's behavior by a fixed alternative. In a quantum game, this misses a natural physical move: a player can apply a local completely positive trace-preserving (CPTP) map to the state it actually received or prepared.
Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics
arXiv:2606.05168v1 Announce Type: new Abstract: Training on synthetic data causes model collapse, but existing analyses treat this as single-chain degradation. In reality, the AI ecosystem involves cross-contamination: models ingest synthetic data from other models, produce new synthetic text, and contaminate shared corpora. We propose a bilayer coupled SIR/SIRS framework -- a phenomenological mean-field model treating data corpora and AI models as two interacting populations, each with...