Home › Knowledge Base › GPQA Diamond

GPQA Diamond

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

ATLAS: Agentic Test-time Learning-to-Allocate Scaling

arXiv:2606.01667v1 Announce Type: new Abstract: Test-time scaling has become a major way to improve large language model reasoning, but its orchestration has remained designer-engineered: a fixed sample budget, a fixed refinement loop, a fixed scoring rule, or a fixed search policy decides how compute is spent, leaving the model in charge of solving but not of orchestration. We introduce ATLAS, an agentic test-time scaling framework in which an LLM orchestrator owns the control loop...

arXiv CS 8d ago

Empirical Characterization of Inference-Time Elicited Probability Transformations in Large Language Models

Announce Type: replace Abstract: Large language models increasingly rely on inference-time procedures such as chain-of-thought reasoning, self-refinement, retrieval augmentation, and verifier-guided revision, yet the structure of elicited probability transformations under these procedures remains poorly understood. We study externally elicited probability assignments over candidate answers and observe recurring approximate log-ratio relationships: \[ \log \tilde q_t(i) = \alpha_t \left( \log...

arXiv CS 9d ago

Pair-In, Pair-Out: Latent Multi-Token Prediction for Efficient LLMs

arXiv:2605.27255v2 Announce Type: replace Abstract: Long chain-of-thought reasoning has made autoregressive decoding the dominant inference cost of modern large language models. Existing methods target either the input side (latent compression) or the output side (speculative decoding and multi-token prediction, MTP), but the two lines of work have been pursued independently. Moreover, output-side methods must incur an expensive verifier pass to validate the unreliable draft tokens predicted...

arXiv CS 9d ago

Interfaze: The Future of AI is built on Task-Specific Small Models

arXiv:2602.04101v2 Announce Type: replace Abstract: We present Interfaze, a native hybrid model that fuses task-specific deep neural networks (CNNs and DNNs) directly into a transformer decoder through a shared embedding space. Specialized perceptual encoders handle optical character recognition (OCR) over complex multilingual PDFs, open-vocabulary object and graphical user interface (GUI) detection, and multilingual speech recognition with diarization. Each is exposed through a...

arXiv CS 6d ago