Home Knowledge Base ARC-Challenge

ARC-Challenge

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

arXiv:2606.02907v1 Announce Type: new Abstract: Linear probing of large language model (LLM) hidden states is widely used to claim that models learn distinct representations for different reasoning types. We test this by probing Qwen3-14B on three benchmarks spanning the classical trichotomy: LogiQA 2.0 (deductive), ARC-Challenge (inductive), and $\alpha$NLI (abductive). At layer 32 of 40, linear probes achieve 100\% cross-validated accuracy with well-separated geometry (intrinsic...

arXiv CS 7d ago

Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

Announce Type: replace Abstract: Linear probing of large language model (LLM) hidden states is widely used to claim that models learn distinct representations for different reasoning types. We test this by probing Qwen3-14B on three benchmarks spanning the classical trichotomy: LogiQA 2.0 (deductive), ARC-Challenge (inductive), and $\alpha$NLI (abductive). At layer 32 of 40, linear probes achieve 100\% cross-validated accuracy with well-separated geometry (intrinsic dimensionalities: 20.6,...

arXiv CS 5d ago

Empirical Characterization of Inference-Time Elicited Probability Transformations in Large Language Models

Announce Type: replace Abstract: Large language models increasingly rely on inference-time procedures such as chain-of-thought reasoning, self-refinement, retrieval augmentation, and verifier-guided revision, yet the structure of elicited probability transformations under these procedures remains poorly understood. We study externally elicited probability assignments over candidate answers and observe recurring approximate log-ratio relationships: \[ \log \tilde q_t(i) = \alpha_t \left( \log...

arXiv CS 9d ago

PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents

arXiv:2606.08106v1 Announce Type: new Abstract: Self-evolving agents improve by repeatedly proposing changes to their own prompts, skills, or workflows and keeping those that score higher on a small held-out set. Almost all effort has gone into the proposer that generates candidates; we argue the weak point is the acceptor, the rule that decides whether to commit a change. Applied hundreds of times against the same noisy dev estimate, the ubiquitous "keep it if the score went up" rule is...

arXiv CS 1d ago