Home › Knowledge Base › Interactive Relative Policy Optimization

Interactive Relative Policy Optimization

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

MC-CPO: Mastery-Conditioned Constrained Policy Optimization for Pedagogically Safe Intelligent Tutoring Systems

arXiv:2604.04251v2 Announce Type: replace Abstract: Intelligent tutoring systems increasingly rely on reinforcement learning to personalise instruction, yet optimising for observable engagement signals can systematically decouple learner activity from genuine knowledge acquisition. Analysing over 21 million student interactions across two deployed platforms, we find engagement events without corresponding mastery gains occur in 26.5% of interactions on Junyi Academy (72,758 students) and...

arXiv CS 1d ago

Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning

Announce Type: replace Abstract: The paradigm of agentic AI is shifting from engineered complex workflows to post-training native models. However, existing agents are typically confined to static, predefined action spaces-such as exclusively using APIs, GUI events, or robotic commands. This rigidity limits their adaptability in dynamic environments where the optimal granularity of interaction varies contextually.

arXiv CS 5d ago

PyraMathBench: Evaluating and Improving Mathematical Capability in Large Language Models

Announce Type: new Abstract: Despite the pivotal role of numerical reasoning as the cornerstone of mathematical capabilities in large language models (LLMs) across applications, few benchmarks evaluate LLMs by integrating numerical processing and mathematical reasoning, hindering the interpretability of failures in math tasks. We introduce PyraMathBench, a comprehensive hierarchical benchmark with 32,505 questions derived from 7,404 math word problems, spanning 4 key cognitive aspects, 14...

arXiv CS 7d ago

Learning in Stackelberg Markov Games

arXiv:2509.16296v2 Announce Type: replace Abstract: Designing socially optimal policies in multi-agent environments is a fundamental challenge in both economics and artificial intelligence. This paper studies a general framework for learning Stackelberg equilibria in dynamic and uncertain environments, where a single leader interacts with a population of adaptive followers. Motivated by pressing real-world challenges such as equitable electricity tariff design for consumers with distributed...

arXiv CS 8d ago

ElasticMem: Latent Memory as a Learnable Resource for LLM Agents

arXiv:2605.30690v1 Announce Type: new Abstract: Long-term memory is essential for LLM agents to reason coherently across extended interactions, personalize responses, and reuse past experience. However, existing memory-augmented methods typically treat memory as a fixed resource: text-space approaches concatenate retrieved memories into the context window, causing substantial token overhead and sensitivity to noisy evidence, while latent-space approaches reduce textual cost but still rely on...

arXiv CS 9d ago

KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering

arXiv:2512.10999v3 Announce Type: replace Abstract: Knowledge Base Question Answering (KBQA) challenges models to bridge the gap between natural language and strict knowledge graph schemas by generating executable logical forms. While Large Language Models (LLMs) have advanced this field, current approaches often struggle with a dichotomy of failure: they either generate hallucinated queries without verifying schema existence or exhibit rigid, template-based reasoning that mimics synthesized...

arXiv CS 7d ago

Ask HN: What are tools you have made for yourself since the advent of AI?

I've made a number of ceramic molds for slumping fused glass into bowls. As well as wooden templates for ceramic mugs. I've devised a few carrying tools to move glass frit paintings from my studio down to my barn where the kilns sit without spilling the glass.

Hacker News 2d ago

AI is blowing up music. How should the Grammys handle it?

Today I’m talking with Harvey Mason Jr., who is CEO of the Recording Academy — that’s the outfit that puts on the Grammy Awards. I last talked to Harvey in 2024, when it was obvious that generative AI would upend the music industry, but still not exactly clear how that would happen.  Well, it’s been 18 months since that conversation, and you’re going to hear Harvey say that AI is now “omnipresent” in music production. And Harvey knows what he’s talking about — he is himself a legendary...

The Verge 9d ago