Home › Knowledge Base › TD(0

TD(0

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples

arXiv:2606.05967v1 Announce Type: cross Abstract: In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method.

arXiv CS 5d ago

Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples

arXiv:2606.05967v2 Announce Type: replace-cross Abstract: In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method.

arXiv CS 2d ago

A Robust $\widetilde{\mathcal{O}}(1/\sqrt{T})$ Rate for Unprojected TD Learning with Linear Function Approximation

Announce Type: replace Abstract: We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone of reinforcement learning. We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the potential function's minimal curvature. While prior work has established convergence guarantees in this setting, these results typically rely on the artificial assumption that each...

arXiv CS 1d ago

Model-free LQG Control with Chance Constraints

arXiv:2605.31310v1 Announce Type: new Abstract: This paper studies model-free optimal control design and its convergence properties for linear time-invariant systems subject to probabilistic risk or chance constraints. In particular, we study a natural policy gradient (NPG)-based actor-critic (AC) algorithm with two timescales, using a Lagrangian primal-dual framework to enforce the constraint.

arXiv CS 9d ago

PLL/WLL Week 4 preview: Key players, stats and sto...

Week 4 of the 2026 Premier Lacrosse League season is June 5-6, as the league heads to American Legion Memorial Stadium in Charlotte. Four PLL games are on the schedule on Friday and Saturday, and the Carolina Chaos have one game each day. The 2026 WLL season also continues, as the Boston Guard take on the California Palms.

ESPN 5d ago