Home › Knowledge Base › PPL

PPL

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics

arXiv:2606.08417v1 Announce Type: new Abstract: Diffusion and continuous flow-based language models have emerged as the leading non-autoregressive alternatives to language modeling. Progress in both paradigms is overwhelmingly tracked by generative perplexity (gen-PPL): the per-token negative log-likelihood of samples under a frozen autoregressive (AR) scorer such as gpt2-large, typically paired with an empirical-entropy guardrail to rule out low-entropy collapse. We argue that this metric...

arXiv CS 1d ago

CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search

Announce Type: new Abstract: Deploying Large Language Models (LLMs) in practice incurs substantial memory and computational costs. Post-training pruning (PTP) is an effective approach to reducing these costs by removing weights without additional training. Among existing methods, RIA introduces relative importance scores normalized by row and column sums, achieving state-of-the-art accuracy.

arXiv CS 8d ago

Chiaroscuro Attention: Spending Compute in the Dark

arXiv:2606.08327v1 Announce Type: new Abstract: Standard transformers apply self-attention uniformly at every layer and token, regardless of whether the input requires dynamic cross-token interaction. We propose CHIAR-Former (Chiaroscuro Attention), a 4-layer hybrid transformer that routes each token to one of three operators - DCT spectral mixing, RBF kernel mixing, or full self-attention - based on per-token spectral entropy, a theoretically justified complexity signal. Through systematic...

arXiv CS 1d ago

Thinking Economically: A Hierarchical Framework for Adaptive-Complexity Reasoning in LLMs

Announce Type: new Abstract: Chain-of-Thought (CoT) has significantly enhanced LLM reasoning, yet often incurs substantial computational overhead due to "overthinking": generating excessively long rationales without commensurate accuracy gains. Existing efficiency methods typically apply uniform compression, which overlooks a critical observation that reasoning complexity is heterogeneous at two distinct granularity: across different problems and within individual reasoning steps. This...

arXiv CS 8d ago

Secret Signal chats reveal how anti-ICE agitators coordinated Newark riots

At 11:30 a.m. on June 3, an activation signal went out on social media calling protesters and agitators to swarm Delaney Hall, the Newark, N.J. ICE detention facility that has become one of the nation's most contentious immigration battlegrounds. BACK TO DELANEY," read an Instagram post, promoted by a fiery collection of anti-Israel, Marxist and Democratic organizations — from "Palestine Solidarity Working Group" and Al-Awda to Indivisible and 50501 — that have joined tumultuous against the...

Fox News Politics 6d ago

Language Modeling with Hyperspherical Flows

arXiv:2605.11125v3 Announce Type: replace Abstract: Discrete Diffusion Language Models progressed rapidly as an alternative to autoregressive (AR) models, motivated by their parallel generation abilities. However, for tractability, discrete diffusion models sample from a factorized distribution, which is less expressive than AR. Recent Flow Language Models (FLMs) apply continuous flows to language, transporting noise to data with a deterministic ODE that avoids factorized sampling.

arXiv CS 8d ago

Secret Signal chats reveal how anti-ICE agitators coordinated Newark riots

Fox News 6d ago