Home › Knowledge Base › oracle feedback

oracle feedback

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design

arXiv:2606.02386v1 Announce Type: new Abstract: Protein language models (PLMs) are passive oracles: they generate sequences in a single forward pass with no mechanism to consult external biophysical feedback or redirect generation when a candidate violates thermodynamic or structural constraints. We introduce AgentPLM, which addresses this by equipping a pre-trained PLM with i) Reasoning-Augmented Decoding (RAD), which interleaves autoregressive generation with tool calls (ESMFold, FoldX,...

arXiv CS 8d ago

WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents

arXiv:2605.20306v2 Announce Type: replace Abstract: We introduce WildRoadBench, a wild aerial road-damage grounding benchmark that couples direct visual grounding by vision-language models with autonomous research-and-engineering by LLM-driven agents on a single professionally annotated UAV corpus. The same image set and the same per-class AP_50 metric are evaluated under two protocols. The VLM Track measures whether a fixed VLM can localise domain-specific damage from one image and one...

arXiv CS 7d ago

MOOSE-Copilot: A Web-Based Interactive Assistant for Unified Exploratory and Fine-Grained Scientific Hypothesis Discovery

Announce Type: replace Abstract: Large language models (LLMs) show remarkable potential in scientific hypothesis discovery. However, existing approaches face two critical limitations: they treat divergent exploratory search and convergent fine-grained refinement as isolated tasks, and they operate autonomously with little to no human guidance. We present MOOSE-Copilot, the first unified framework to bridge this abstraction gap through a formalized human-AI interaction (HAII) protocol.

arXiv CS 1d ago

Auto-Discovery-Bench: Diagnosing Structured State Tracking in Oracle-Guided Discovery

arXiv:2502.15224v2 Announce Type: replace Abstract: Interactive discovery requires agents to maintain and update structured beliefs over many rounds of feedback. Before evaluating agents in noisy, open-ended scientific environments, it is useful to isolate this prerequisite capability under controlled conditions. We introduce Auto-Discovery-Bench, a deterministic oracle-guided diagnostic benchmark in which agents recover hidden structures through repeated hypothesis--intervention--feedback...

arXiv CS 9d ago

Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards

Announce Type: new Abstract: Reinforcement learning has recently shown promise in improving large language models for Text-to-SQL generation, yet existing methods typically optimize one-shot rewards defined over a single SQL state. Such rewards provide limited guidance for iterative SQL correction and are insufficient to capture the improvement of multi-turn SQL refinement. In this paper, we propose Progress-SQL, a multi-turn reinforcement learning framework with progressive rewards for...

arXiv CS 2d ago

Tensor Algebraic Property Skeletons: Amplifying Property-Based Testing for AI Compilers

Announce Type: new Abstract: Deep learning (DL) compilers such as TVM and ONNX-MLIR lower tensor computation graphs into optimized executables for target backends. Testing these AI compilers has made substantial progress in generating well-formed inputs in the context of fuzzing; however, such generation alone does not catch semantic drifts from algebraic invariants that graph transformations and optimizations are expected to preserve. While tensor algebra has been studied for decades, it...

arXiv CS 2d ago

On the Generalization Gap in Self-Evolving Language Model Reasoning

new Abstract: Recent work suggests that large language models (LLMs) can improve through self-evolution (SE), using supervision signals generated by the model itself. In this work, we ask: under a strict closed-loop setup, where the self-evolution algorithm has access only to an unlabeled prompt set and a base model, how close can internally generated supervision come to oracle-supervised training? We analyze four representative strategies in a unified offline self-evolution framework:...

arXiv CS 8d ago

On the Generalization Gap in Self-Evolving Language Model Reasoning

arXiv:2606.01075v2 Announce Type: new Abstract: Recent work suggests that large language models (LLMs) can improve through self-evolution (SE), using supervision signals generated by the model itself. In this work, we ask: under a strict closed-loop setup, where the self-evolution algorithm has access only to an unlabeled prompt set and a base model, how close can internally generated supervision come to oracle-supervised training?

arXiv CS 7d ago

MySQL faithful launch OurSQL Foundation to keep Oracle honest

The Register 15d ago

Efficient Exploration for Iterative Nash Preference Optimization

arXiv:2606.01382v1 Announce Type: new Abstract: Preference alignment is central to improving large language models, but standard reward-based formulations can be restrictive when human preferences are cyclic, non-transitive, or otherwise not representable by a scalar reward. Nash Learning from Human Feedback (NLHF) addresses this limitation by modeling alignment as a preference game and targeting a Nash equilibrium rather than a reward maximizer. However, the learning-theoretic foundations...

arXiv CS 8d ago