Home Knowledge Base \sqrt{d}$

\sqrt{d}$

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

The Unreasonable Redundancy of Nature's Protein Folds

The Unreasonable Redundancy of Nature's Protein Folds Over the last few years, deep neural networks have made generative language modeling dramatically more powerful, giving us large language models. A similar leap happened for continuous modalities like images and videos.

Hacker News 7d ago

An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction

arXiv:2508.11931v3 Announce Type: replace Abstract: We present an oracle-efficient, near-optimal algorithm for linear contextual bandits with adversarial losses and stochastic action sets, only requiring a linear optimization oracle for the action sets in each round. Our approach reduces this setting to misspecification-robust adversarial linear bandits with fixed action sets. Without knowledge of the context distribution or access to a context simulator, the algorithm achieves...

arXiv CS 8d ago

When and why randomised exploration works (in linear bandits)

arXiv:2502.08870v2 Announce Type: replace Abstract: We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$-dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an $n$-step regret bound of the order $O(d\sqrt{n} \log(n))$. Notably, this shows for the first time that there...

arXiv CS 6d ago

Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

arXiv:2603.03480v2 Announce Type: replace Abstract: We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces,...

arXiv CS 7d ago

Rectangular Matrix Multiplication in the Low-Bandwidth Model

arXiv:2606.04652v1 Announce Type: new Abstract: We study rectangular matrix multiplication in the low-bandwidth model of distributed computing. There are $n$ computers; initially the input matrices are distributed evenly between computers, and in each communication round every computer can send and receive an $O(\log n)$-bit message. Eventually each computer must output its designated part of the product matrix.

arXiv CS 6d ago

Finite-Temperature de Bruijn Identities: Fisher Information as the Spectral Gap of Blahut--Arimoto Dynamics

arXiv:2606.03813v1 Announce Type: new Abstract: We uncover a finite-temperature extension of de Bruijn's identity -- the classical relation $\frac{d}{dt}h(X+\sqrt{t}Z)=\frac{1}{2}J(X)$ connecting differential entropy and Fisher information. Our framework is the spectral theory of Blahut--Arimoto (BA) dynamics, recently developed by Wang~\cite{Wang2026} for the analysis of rate-distortion optimization. The central observation is elementary yet profound: for Gaussian sources, the spectral gap...

arXiv CS 7d ago

Batched Stochastic Linear Bandits with 1-Bit Communication Constraints

Announce Type: cross Abstract: We study stochastic linear bandits under a natural combination of batching and communication constraints: the time horizon is partitioned into batches of equal size $B$, and during each batch the learner sends $B$ requested arm pulls to an agent, who then observes the corresponding $B$ rewards and responds with a single bit of feedback to the learner. For each batch, the learner specifies the 1-bit quantization rule the agent uses, which may depend on all...

arXiv CS 9d ago

Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

arXiv:2605.01752v4 Announce Type: replace Abstract: We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and a cumulative corruption budget $\mathcal{C}$. To address these challenges, we propose e RCDP-UCB, which integrates a learned approximator that predicts post-serving contexts from pre-serving information....

arXiv CS 8d ago

Coherent Swap Regret and Channel-Proof Learning

arXiv:2606.02655v1 Announce Type: cross Abstract: External regret certifies stability only against replacing one's behavior by a fixed alternative. In a quantum game, this misses a natural physical move: a player can apply a local completely positive trace-preserving (CPTP) map to the state it actually received or prepared.

arXiv CS 7d ago

Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics

arXiv:2606.05168v1 Announce Type: new Abstract: Training on synthetic data causes model collapse, but existing analyses treat this as single-chain degradation. In reality, the AI ecosystem involves cross-contamination: models ingest synthetic data from other models, produce new synthetic text, and contaminate shared corpora. We propose a bilayer coupled SIR/SIRS framework -- a phenomenological mean-field model treating data corpora and AI models as two interacting populations, each with...

arXiv CS 5d ago