PPL
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics
arXiv:2606.08417v1 Announce Type: new Abstract: Diffusion and continuous flow-based language models have emerged as the leading non-autoregressive alternatives to language modeling. Progress in both paradigms is overwhelmingly tracked by generative perplexity (gen-PPL): the per-token negative log-likelihood of samples under a frozen autoregressive (AR) scorer such as gpt2-large, typically paired with an empirical-entropy guardrail to rule out low-entropy collapse. We argue that this metric...
CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search
Announce Type: new Abstract: Deploying Large Language Models (LLMs) in practice incurs substantial memory and computational costs. Post-training pruning (PTP) is an effective approach to reducing these costs by removing weights without additional training. Among existing methods, RIA introduces relative importance scores normalized by row and column sums, achieving state-of-the-art accuracy.
Chiaroscuro Attention: Spending Compute in the Dark
arXiv:2606.08327v1 Announce Type: new Abstract: Standard transformers apply self-attention uniformly at every layer and token, regardless of whether the input requires dynamic cross-token interaction. We propose CHIAR-Former (Chiaroscuro Attention), a 4-layer hybrid transformer that routes each token to one of three operators - DCT spectral mixing, RBF kernel mixing, or full self-attention - based on per-token spectral entropy, a theoretically justified complexity signal. Through systematic...
Thinking Economically: A Hierarchical Framework for Adaptive-Complexity Reasoning in LLMs
Announce Type: new Abstract: Chain-of-Thought (CoT) has significantly enhanced LLM reasoning, yet often incurs substantial computational overhead due to "overthinking": generating excessively long rationales without commensurate accuracy gains. Existing efficiency methods typically apply uniform compression, which overlooks a critical observation that reasoning complexity is heterogeneous at two distinct granularity: across different problems and within individual reasoning steps. This...
Secret Signal chats reveal how anti-ICE agitators coordinated Newark riots
At 11:30 a.m. on June 3, an activation signal went out on social media calling protesters and agitators to swarm Delaney Hall, the Newark, N.J. ICE detention facility that has become one of the nation's most contentious immigration battlegrounds. BACK TO DELANEY," read an Instagram post, promoted by a fiery collection of anti-Israel, Marxist and Democratic organizations — from "Palestine Solidarity Working Group" and Al-Awda to Indivisible and 50501 — that have joined tumultuous against the...
Language Modeling with Hyperspherical Flows
arXiv:2605.11125v3 Announce Type: replace Abstract: Discrete Diffusion Language Models progressed rapidly as an alternative to autoregressive (AR) models, motivated by their parallel generation abilities. However, for tractability, discrete diffusion models sample from a factorized distribution, which is less expressive than AR. Recent Flow Language Models (FLMs) apply continuous flows to language, transporting noise to data with a deterministic ODE that avoids factorized sampling.
Secret Signal chats reveal how anti-ICE agitators coordinated Newark riots
At 11:30 a.m. on June 3, an activation signal went out on social media calling protesters and agitators to swarm Delaney Hall, the Newark, N.J. ICE detention facility that has become one of the nation's most contentious immigration battlegrounds. BACK TO DELANEY," read an Instagram post, promoted by a fiery collection of anti-Israel, Marxist and Democratic organizations — from "Palestine Solidarity Working Group" and Al-Awda to Indivisible and 50501 — that have joined tumultuous against the...