Home Knowledge Base THE HIDDEN REASON

THE HIDDEN REASON

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Right Makes Might: Aligning Verified Hidden States Empowers RL Reasoning

Announce Type: new Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has become the dominant approach for improving mathematical reasoning in large language models, yet current methods reduce each correct rollout to a single reward bit, ignoring the geometric structure shared among their hidden states. Investigating this structure, we find that at the anchor token (the position immediately before the answer marker), correct rollouts converge naturally because they must produce...

arXiv CS 7d ago

Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

Announce Type: replace Abstract: Linear probing of large language model (LLM) hidden states is widely used to claim that models learn distinct representations for different reasoning types. We test this by probing Qwen3-14B on three benchmarks spanning the classical trichotomy: LogiQA 2.0 (deductive), ARC-Challenge (inductive), and $\alpha$NLI (abductive). At layer 32 of 40, linear probes achieve 100\% cross-validated accuracy with well-separated geometry (intrinsic dimensionalities: 20.6,...

arXiv CS 5d ago

Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

arXiv:2606.02907v1 Announce Type: new Abstract: Linear probing of large language model (LLM) hidden states is widely used to claim that models learn distinct representations for different reasoning types. We test this by probing Qwen3-14B on three benchmarks spanning the classical trichotomy: LogiQA 2.0 (deductive), ARC-Challenge (inductive), and $\alpha$NLI (abductive). At layer 32 of 40, linear probes achieve 100\% cross-validated accuracy with well-separated geometry (intrinsic...

arXiv CS 7d ago

LoRi: Low-Rank Distillation for Implicit Reasoning

new Abstract: Implicit chain-of-thought (iCoT) methods aim to internalize reasoning in large language models, but often underperform explicit CoT prompting. We empirically find that hidden-state reasoning trajectories exhibit low-rank structure. Motivated by this observation, we propose a low-rank distillation framework that transfers reasoning by aligning teacher and student trajectories in a shared low-rank tensor subspace using first- and second-order statistics.

arXiv CS 5d ago

Adaptive Generate-Rank-Verify: Inference-Time Search with Costly Verification

Announce Type: replace Abstract: Many inference-time language-model pipelines combine a cheap reward signal with an expensive verifier, such as exact answer checking in mathematical reasoning or hidden-test execution in code generation. We formalize this setting using a learning-theoretic lens as generative active search: a cost-sensitive first-positive search problem in which a policy adaptively samples candidates from an unknown distribution, observes cheap scores, and pays for verifier...

arXiv CS 1d ago

Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them

For reasons that will remain hidden, we resume writing about Generative AI/LLM after a hiatus of 15 months (that one from October 2025, and the one from June 2025, don’t really count as serious pieces). Today, the first of two articles about “coding with Large ‘Language’ Models”, as coding with LLMs is positioned as the ‘killer app‘ for LLMs. We interrupt this program for a short digression on Anthropic’s recently released blog post When AI builds itself.

Hacker News 2d ago

The Hidden Bias of Process Reward Models:PRISM for Rewarding the Right Reasoning

arXiv:2606.09078v1 Announce Type: new Abstract: Process Reward Models (PRMs) improve credit assignment for reasoning by providing step-level feedback. However, we identify a hidden bias in PRMs caused by severe imbalance in step-level training data.

arXiv CS 1d ago

When to Re-Plan: Subgoal Persistence in Hierarchical Latent Reasoning

arXiv:2606.03741v1 Announce Type: new Abstract: Long-horizon reasoning requires a system to commit to medium-horizon intent without becoming rigid: re-plan too often and computation never coheres into multi-step structure; commit too long and the plan goes stale. We study this stability-adaptivity tradeoff in the latent reasoning setting, where multi-step computation occurs inside hidden state rather than externalized token traces. We extend the Hierarchical Reasoning Model (HRM) with a...

arXiv CS 7d ago

MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models

arXiv:2606.04627v1 Announce Type: new Abstract: Mobile agents are increasingly expected to operate everyday applications from screenshots and language goals, where reliable control requires reasoning over screen affordances, multi-step navigation, and future state changes. However, many agents externalize this computation as long textual chains of thought, which slows interaction, increases supervision cost, and complicates deployment. We introduce MIRAGE, a framework that learns continuous...

arXiv CS 6d ago

MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models

arXiv:2606.04627v2 Announce Type: replace Abstract: Mobile agents are increasingly expected to operate everyday applications from screenshots and language goals, where reliable control requires reasoning over screen affordances, multi-step navigation, and future state changes. However, many agents externalize this computation as long textual chains of thought, which slows interaction, increases supervision cost, and complicates deployment.

arXiv CS 1d ago