Reinforcement-Learning
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward
Announce Type: cross Abstract: A reinforcement-learning agent maximises its reward, which can diverge from the outcome its designer intended. In physical control the reward rarely closes that gap, and drag reduction in wall turbulence makes it concrete. A mass-conservation projection couples agents' outputs and erases the per-agent credit the policy gradient needs
Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward
Announce Type: new Abstract: A reinforcement-learning agent maximises its reward, which can diverge from the outcome its designer intended. In physical control the reward rarely closes that gap, and drag reduction in wall turbulence makes it concrete. A mass-conservation projection couples agents' outputs and erases the per-agent credit the policy gradient needs
A Lecture Note on Offline RL and IRL, Part II: Foundations of Inverse Reinforcement Learning and Dynamic Discrete Choice Models
arXiv:2605.30843v1 Announce Type: new Abstract: In the forward reinforcement-learning problem, the reward is fixed and known; the learner is asked to find a good policy or value function. Here we turn the question around. Given offline data generated by an expert, can we recover the reward the expert was optimizing?
Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models
Announce Type: new Abstract: Group Relative Policy Optimisation (GRPO) has emerged as an effective reinforcement-learning algorithm for aligning language models on reasoning tasks, but it treats every token position and every sampled rollout symmetrically. We introduce two complementary extensions: (i) Adaptive-Horizon GRPO (AH-GRPO), which weights each token's policy gradient using a cumulative entropy-based discount that reduces the effective horizon when the model is uncertain, and (ii)...
Self-Optimizing Control of Continuous Processes Based on Reinforcement Learning
new Abstract: This paper addresses the Self-Optimizing Control (SOC) problem in industrial continuous processes and proposes a Reinforcement-Learning (RL)-based SOC approach to improve dynamic performance under high-frequency disturbances. In the proposed framework, the SOC controlled variable structure is embedded in the Actor network, and reward functions are designed based on economic indicators. Through interaction with the environment, the RL agent optimizes controlled variables while...
GARL: Game-Theoretic Reinforcement Learning for Multi-Agent Strategic Prioritisation
Announce Type: new Abstract: LLM-based multi-agent systems are increasingly used for strategic decision-making tasks. In such settings, performance depends not only on individual model capabilities, but also on the policies by which agents interact and adapt. Multi-agent reinforcement learning can optimise these interaction policies, but its reward design often remains task-specific and weakly grounded in interaction structure.
Agent-R1: A Unified and Modular Framework for Agentic Reinforcement Learning
arXiv:2511.14460v2 Announce Type: replace Abstract: Large language models (LLMs) have rapidly evolved from single-turn text generators into the foundation of increasingly capable agents. As these agents take on more complex reasoning, decision making, tool use, and long-horizon tasks, reinforcement learning (RL) is becoming increasingly important for shaping their behavior. This shift is especially visible in agentic RL, where models must interact with tools and environments across multiple...
Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning
arXiv:2605.28829v2 Announce Type: replace Abstract: Competitive STEM examinations such as JEE and NEET require multi-step symbolic reasoning, precise numerical computation, and deep conceptual understanding across physics, chemistry, and mathematics. Recent large language models perform strongly on common reasoning benchmarks, yet they remain difficult to deploy at scale, where millions of student doubts demand domain-specific, consistently structured problem solving. We introduce Aryabhata...
Meta-Adaptive Beam Search Planning for Transformer-Based Reinforcement Learning Control of UAVs with Overhead Manipulators under Flight Disturbances
arXiv:2603.26612v2 Announce Type: replace Abstract: Drones equipped with overhead manipulators offer unique capabilities for inspection, maintenance, and contact-based interaction. However, the motion of the drone and its manipulator is tightly linked, and even small attitude changes caused by wind or control imperfections shift the end-effector away from its intended path. This coupling makes reliable tracking difficult and also limits the direct use of learning-based arm controllers that...
Reinforcement Learning-Enabled Agent for Transmitter Optimization in Digital-Analog Radio-over-Fiber Fronthaul
arXiv:2606.04840v1 Announce Type: new Abstract: Digital-analog radio-over-fiber (DA-RoF) has emerged as a promising fronthaul solution that combines the high spectral efficiency of analog transmission with the robustness of digital transmission. However, the performance of DA-RoF critically depends on several tightly coupled parameters, including the rounding factor (RF), scaling factor (SF), geometric shaping (GS) factor, and pre-equalization taps coefficients, which jointly affect...