Home › Knowledge Base › Top-$p$

Top-$p$

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training

arXiv:2512.13996v3 Announce Type: replace Abstract: Sparse Mixture-of-Experts architectures are essential for scaling model capacity efficiently, yet the standard Top-$k$ routing imposes a rigid sparsity pattern that ignores the intrinsic variance in token difficulty and layer-specific computational needs. Top-$p$ routing is more adaptive because it selects experts until their cumulative routing probability reaches a threshold, allowing confident tokens to use fewer experts and ambiguous...

arXiv CS 7d ago

DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training

arXiv:2512.13996v2 Announce Type: replace Abstract: Sparse Mixture-of-Experts architectures are essential for scaling model capacity efficiently, yet the standard Top-$k$ routing imposes a rigid sparsity pattern that ignores the intrinsic variance in token difficulty and layer-specific computational needs. Top-$p$ routing is more adaptive because it selects experts until their cumulative routing probability reaches a threshold, allowing confident tokens to use fewer experts and ambiguous...

arXiv CS 9d ago

Who are your club's future stars? Updated top 10 p...

As we enter June, it's time for our next team-by-team MLB prospect rankings big board update. We've revised the top 10 prospects for all 30 teams. What has changed since May?

ESPN 9d ago

S&P 500 Tops 7,600 as AI Fuels Nine-Day Win Streak

Bloomberg Markets 7d ago

DynMuon: A Dynamic Spectral Shaping View of Muon

arXiv:2605.17109v3 Announce Type: replace Abstract: In recent years, Muon has emerged as the dominant method for training large language models, and transformers more broadly. The essential difference, when compared to standard gradient descent methods, is to replace the usual update matrix $M=U\Sigma V^\top$ with its polar factor $UV^\top$. In this work, we consider a class of Muon-like updates, where we replace the update $M$ with $U\Sigma^p V^\top$ for some parameter $p$. We call this a...

arXiv CS 8d ago

The stock market just did something eerily similar to the dotcom bubble top in 2000

The S&P 500 closed at a record on the last trading day of May, but only a handful of stocks — focused mostly in the AI area — hit their own all-time highs. This strange occurrence echoes what happened at the top of dotcom bubble 26 years ago. On Friday, just 20 of the index members hit a record.

CNBC 9d ago

Double Electron Attachment and Double Ionization Potential Equation-of-Motion Coupled-Cluster Approaches with Full and Active-Space Treatments of 4-Particle-2-Hole and 4-Hole-2-Particle Excitations and Three-Body Clusters

arXiv:2605.20556v2 Announce Type: replace Abstract: The double electron attachment (DEA) and double ionization potential (DIP) equation-of-motion coupled-cluster (EOMCC) methods including up to 4-particle-2-hole (4$p$-2$h$) and 4-hole-2-particle (4$h$-2$p$) excitations on top of coupled-cluster singles, doubles, and triples (CCSDT), denoted DEA-EOMCCSDT(4$p$-2$h$) and DIP-EOMCCSDT(4$h$-2$p$), have been efficiently implemented in full and active-space forms. The resulting methods are applied...

arXiv Physics 8d ago

On Sketching Trimmed Statistics

Announce Type: replace Abstract: We study sketching trimmed statistics of a frequency vector, including the $F_p$ moment of the top-$k$ coordinates and of the trimmed-$k$ vector. Despite their natural role in robust analytics, this is the first time these problems have been studied in any sublinear space setting. For $p \in [0,2]$, we obtain $poly(\log n/\varepsilon)$-space algorithms for both tasks when $k$ is moderately large, and for general $k$ we identify a sharp structural threshold...

arXiv CS 8d ago

Fairness in two-player zero-sum games with bandit feedback

Announce Type: new Abstract: We study two-player zero-sum games (TPZSGs) with bandit feedback under fairness constraints requiring every action to be played with probability at least $\alpha/m$. Existing instance-dependent results target $\textit{pure}$ Nash equilibria, while fairness generically produces $\textit{mixed}$ equilibria, a harder learning target. Our key technical tool is a reparametrization: every fair strategy decomposes as $p = (\alpha/m)\mathbf{1} + (1-\alpha)\widetilde{p}$...

arXiv CS 8d ago

PAEC: Position-Aware Entropy Calibration for LLM Reasoning in RLVR

arXiv:2606.08543v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) improves large language model reasoning but often suffers from rapid policy-entropy collapse, where the policy prematurely concentrates on narrow high-probability reasoning paths. While global entropy regularization can encourage exploration, uniformly increasing entropy across all token positions is inefficient for long reasoning trajectories, where many tokens are not decision-relevant. We...

arXiv CS 1d ago