Delayed Feedback
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Parameter-free Dynamic Regret: Time-varying Movement Costs, Delayed Feedback, and Memory
arXiv:2602.06902v2 Announce Type: replace Abstract: In this paper, we study dynamic regret in unconstrained online convex optimization (OCO) with movement costs. Specifically, we generalize the standard setting by allowing the movement cost coefficients $\lambda_t$ to vary arbitrarily over time. Our main contribution is a novel algorithm that establishes the first comparator-adaptive dynamic regret bound for this setting, guaranteeing $\widetilde{\mathcal{O}}(\sqrt{(M^2+MP_T)(T+\sum_t...
Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions
arXiv:2605.01752v4 Announce Type: replace Abstract: We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and a cumulative corruption budget $\mathcal{C}$. To address these challenges, we propose e RCDP-UCB, which integrates a learned approximator that predicts post-serving contexts from pre-serving information....
Observer-Based Control of Linear Systems with Mismatched Input and Output Delays
arXiv:2606.03081v1 Announce Type: new Abstract: This paper investigates the stabilization of linear systems subject to simultaneous, mismatched time delays in both the control input and system output vectors. The proposed control framework is developed in two primary stages. First, an asymptotically stabilizing delayed state-feedback controller is synthesized by leveraging recent advancements in Linear Matrix Inequality (LMI) techniques.
ViVa: A Video-Generative Value Model for Robot Reinforcement Learning
arXiv:2604.08168v2 Announce Type: replace Abstract: Vision-language-action (VLA) models have advanced robot manipulation through large-scale pretraining, but real-world deployment remains challenging due to partial observability and delayed feedback. Reinforcement learning addresses this via value functions, which assess task progress and guide policy improvement. However, existing value models built on vision-language models (VLMs) struggle to capture temporal dynamics and physical...
Tree-of-Experience: A Structured Experience-Management Solution for Self-Evolving Agents under Low-Repetition and Implicit-Reward Environments
arXiv:2606.06960v1 Announce Type: new Abstract: Experience-based self-evolution is crucial for LLM agents, but existing benchmarks often assume explicit goals, stable task patterns, and clear feedback. We study a more challenging setting: low-repetition tasks with implicit rewards, where past experience is difficult to reuse and feedback is delayed, noisy, and outcome-level. We introduce \textsc{FinEvolveBench}, a temporally controlled benchmark for financial sentiment prediction that links...
Beyond Tokens: Enhancing RTL Quality Estimation via Structural Graph Learning
arXiv:2508.18730v2 Announce Type: replace Abstract: Estimating the quality of register transfer level (RTL) designs is crucial in the electronic design automation (EDA) workflow, as it enables instant feedback on key performance metrics like area and delay without the need for time-consuming logic synthesis. While recent approaches have leveraged large language models (LLMs) to derive embeddings from RTL code and achieved promising results, they overlook the structural semantics essential...
Engagement Process: Rethinking the Temporal Interface of Action and Observation
Announce Type: replace Abstract: Task completion in digital and physical environments increasingly involves complex temporal interaction, where actions and observations unfold over different time scales rather than align with fixed observation--action steps. To model such interactions, we propose \emph{Engagement Process} (EP), an interaction formalism that inherits the decision-theoretic structure of POMDPs while making time explicit in the action--observation interface. EP represents...
Co-Evolving Skill Generation and Policy Optimization
Announce Type: new Abstract: Skill-augmented reinforcement learning improves language agents by storing reusable procedural knowledge acquired from past experience. Existing methods typically use strong language models to analyze trajectories, generate skills, and update a retrievable skill bank during online training. However, they rarely assess whether a newly generated skill is useful before it is stored and reused.
Symphony-Coord: Adaptive Routing for Multi-Agent LLM Systems
arXiv:2602.00966v2 Announce Type: replace Abstract: Multi-agent large language model systems can tackle complex multi-step tasks by decomposing work and coordinating specialized behaviors. However, current coordination mechanisms typically rely on statically assigned roles and centralized controllers. As agent pools and task distributions evolve, these design choices can lead to inefficient routing, poor adaptability, and fragile fault recovery.
Popularity Feedback Constrains Innovation in Cultural Markets
arXiv:2602.09997v2 Announce Type: replace Abstract: Real-world creative processes ranging from art to science rely on social feedback-loops between selection and creation. Yet, the effects of popularity feedback on collective creativity remain poorly understood. We investigate how popularity ratings influence cultural dynamics in a large-scale online experiment where participants ($N = 1\,008$) iteratively \textit{select} images from evolving markets and \textit{produce} their own modifications.