Home Knowledge Base Q-Learning

Q-Learning

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics

arXiv:2606.02645v1 Announce Type: cross Abstract: Periodic target updates in Q-learning and soft target updates in actor-critic methods are empirically well established stabilization mechanisms, but their precise theoretical explanation is still incomplete. This paper gives a rigorous and exact analysis of these mechanisms for Q-learning with linear function approximation (linear Q-learning) using the exact switched linear system (SLS) dynamics induced by the Bellman maximum and the joint...

arXiv CS 7d ago

Autopilot-Preserving Residual Q-Learning with HJB-Inspired Finite-Action Risk Filtering for Fixed-Wing UAV Command Supervision

Announce Type: new Abstract: A fixed-wing UAV must hold airspeed, altitude, and heading references under wind, gusts, and turbulence, channels coupled so that correcting one can degrade another. Classical autopilots stabilize the airframe well but adapt poorly when a hard crosswind meets an aggressive turn, while reinforcement-learning (RL) policies acting directly on the surfaces concentrate exploration risk at the actuator interface. We place a learned supervisor above an unchanged...

arXiv CS 8d ago

SDM-Q: Cost-Aware Staged Decision-Making for Multi-Omics Classification with Deep Q-Learning

Announce Type: replace Abstract: Multi-omics data provide complementary molecular characterizations of disease phenotypes and play an important role in disease diagnosis and subtype classification in precision medicine. However, acquiring complete multi-omics profiles is expensive and time-consuming, while most existing deep learning methods assume full modality availability during inference, resulting in substantial redundancy and limited practicality in clinical settings. To address this...

arXiv CS 1d ago

SDM-Q: Cost-Aware Staged Decision-Making for Multi-Omics Classification with Deep Q-Learning

arXiv:2605.31014v1 Announce Type: new Abstract: Multi-omics data provide complementary molecular characterizations of disease phenotypes and play an important role in disease diagnosis and subtype classification in precision medicine. However, acquiring complete multi-omics profiles is expensive and time-consuming, while most existing deep learning methods assume full modality availability during inference, resulting in substantial redundancy and limited practicality in clinical settings. To...

arXiv CS 9d ago

COP-Q: Safety-First Reinforcement Learning for Robot Control via Cholesky-Ordered Projection

arXiv:2606.04749v1 Announce Type: new Abstract: Safe robot control requires maximizing return while satisfying safety constraints. In off-policy safe reinforcement learning, reward and safety Q-values are commonly learned by separate critic ensembles, with uncertainty handled independently for each objective. This objective-wise treatment neglects inter-objective correlation and can lead to overly conservative value estimates, thereby reducing sample efficiency.

arXiv CS 6d ago

Domination-Avoiding Learning Agents Cannot Collude

Announce Type: new Abstract: An influential paper of Calvano et al. empirically demonstrated that Q-learning agents spontaneously collude when placed as sellers that compete on prices in a natural market model. More recent results of Fish et al.

arXiv CS 8d ago

Delayed Repression and Emergent Instability in Adaptive Multi-Agent Systems

arXiv:2605.30392v2 Announce Type: replace Abstract: Regulatory institutions (from content moderation platforms to financial supervisors) observe, deliberate, and intervene only after a characteristic delay. We ask whether this processing lag alone can destabilize a multi-agent system that would otherwise remain stable, without exogenous shocks, coordination among agents, or malicious actors. We study this in two stages.

arXiv CS 7d ago

Delayed Repression and Emergent Instability in Adaptive Multi-Agent Systems

Announce Type: new Abstract: Regulatory institutions (from content moderation platforms to financial supervisors) observe, deliberate, and intervene only after a characteristic delay. We ask whether this processing lag alone can destabilize a multi-agent system that would otherwise remain stable, without exogenous shocks, coordination among agents, or malicious actors. We study this question in two stages.

arXiv CS 9d ago

Model-Based Learning of Whittle indices

Announce Type: replace Abstract: We present BLINQ, a new model-based algorithm that learns the Whittle indices of an indexable, communicating and unichain Markov Decision Process (MDP). Our approach relies on building an empirical estimate of the MDP and then computing its Whittle indices using an extended version of a state-of-the-art existing algorithm. We provide a proof of convergence to the Whittle indices we want to learn as well as a bound on the time needed to learn them with...

arXiv CS 1d ago

A Human-Sensitive Controller: Adapting to Human Musculoskeletal Disorder-Related Constraints via Reinforcement Learning

Announce Type: replace Abstract: Work-Related Musculoskeletal Disorders continue to be a major challenge in industrial environments, leading to reduced workforce participation, increased healthcare costs, and long-term disability. This study introduces a human-sensitive robotic system aimed at reintegrating individuals with a history of musculoskeletal disorders into standard job roles, while simultaneously optimizing ergonomic conditions for the broader workforce. This research leverages...

arXiv CS 2d ago