Top-$p$
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training
arXiv:2512.13996v3 Announce Type: replace Abstract: Sparse Mixture-of-Experts architectures are essential for scaling model capacity efficiently, yet the standard Top-$k$ routing imposes a rigid sparsity pattern that ignores the intrinsic variance in token difficulty and layer-specific computational needs. Top-$p$ routing is more adaptive because it selects experts until their cumulative routing probability reaches a threshold, allowing confident tokens to use fewer experts and ambiguous...
DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training
arXiv:2512.13996v2 Announce Type: replace Abstract: Sparse Mixture-of-Experts architectures are essential for scaling model capacity efficiently, yet the standard Top-$k$ routing imposes a rigid sparsity pattern that ignores the intrinsic variance in token difficulty and layer-specific computational needs. Top-$p$ routing is more adaptive because it selects experts until their cumulative routing probability reaches a threshold, allowing confident tokens to use fewer experts and ambiguous...
Who are your club's future stars? Updated top 10 p...
As we enter June, it's time for our next team-by-team MLB prospect rankings big board update. We've revised the top 10 prospects for all 30 teams. What has changed since May?
DynMuon: A Dynamic Spectral Shaping View of Muon
arXiv:2605.17109v3 Announce Type: replace Abstract: In recent years, Muon has emerged as the dominant method for training large language models, and transformers more broadly. The essential difference, when compared to standard gradient descent methods, is to replace the usual update matrix $M=U\Sigma V^\top$ with its polar factor $UV^\top$. In this work, we consider a class of Muon-like updates, where we replace the update $M$ with $U\Sigma^p V^\top$ for some parameter $p$. We call this a...
The stock market just did something eerily similar to the dotcom bubble top in 2000
The S&P 500 closed at a record on the last trading day of May, but only a handful of stocks — focused mostly in the AI area — hit their own all-time highs. This strange occurrence echoes what happened at the top of dotcom bubble 26 years ago. On Friday, just 20 of the index members hit a record.
Double Electron Attachment and Double Ionization Potential Equation-of-Motion Coupled-Cluster Approaches with Full and Active-Space Treatments of 4-Particle-2-Hole and 4-Hole-2-Particle Excitations and Three-Body Clusters
arXiv:2605.20556v2 Announce Type: replace Abstract: The double electron attachment (DEA) and double ionization potential (DIP) equation-of-motion coupled-cluster (EOMCC) methods including up to 4-particle-2-hole (4$p$-2$h$) and 4-hole-2-particle (4$h$-2$p$) excitations on top of coupled-cluster singles, doubles, and triples (CCSDT), denoted DEA-EOMCCSDT(4$p$-2$h$) and DIP-EOMCCSDT(4$h$-2$p$), have been efficiently implemented in full and active-space forms. The resulting methods are applied...
On Sketching Trimmed Statistics
Announce Type: replace Abstract: We study sketching trimmed statistics of a frequency vector, including the $F_p$ moment of the top-$k$ coordinates and of the trimmed-$k$ vector. Despite their natural role in robust analytics, this is the first time these problems have been studied in any sublinear space setting. For $p \in [0,2]$, we obtain $poly(\log n/\varepsilon)$-space algorithms for both tasks when $k$ is moderately large, and for general $k$ we identify a sharp structural threshold...
Fairness in two-player zero-sum games with bandit feedback
Announce Type: new Abstract: We study two-player zero-sum games (TPZSGs) with bandit feedback under fairness constraints requiring every action to be played with probability at least $\alpha/m$. Existing instance-dependent results target $\textit{pure}$ Nash equilibria, while fairness generically produces $\textit{mixed}$ equilibria, a harder learning target. Our key technical tool is a reparametrization: every fair strategy decomposes as $p = (\alpha/m)\mathbf{1} + (1-\alpha)\widetilde{p}$...
PAEC: Position-Aware Entropy Calibration for LLM Reasoning in RLVR
arXiv:2606.08543v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) improves large language model reasoning but often suffers from rapid policy-entropy collapse, where the policy prematurely concentrates on narrow high-probability reasoning paths. While global entropy regularization can encourage exploration, uniformly increasing entropy across all token positions is inefficient for long reasoning trajectories, where many tokens are not decision-relevant. We...