Home › Knowledge Base › Delayed Feedback

Delayed Feedback

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Parameter-free Dynamic Regret: Time-varying Movement Costs, Delayed Feedback, and Memory

arXiv:2602.06902v2 Announce Type: replace Abstract: In this paper, we study dynamic regret in unconstrained online convex optimization (OCO) with movement costs. Specifically, we generalize the standard setting by allowing the movement cost coefficients $\lambda_t$ to vary arbitrarily over time. Our main contribution is a novel algorithm that establishes the first comparator-adaptive dynamic regret bound for this setting, guaranteeing $\widetilde{\mathcal{O}}(\sqrt{(M^2+MP_T)(T+\sum_t...

arXiv CS 9d ago

Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

arXiv:2605.01752v4 Announce Type: replace Abstract: We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and a cumulative corruption budget $\mathcal{C}$. To address these challenges, we propose e RCDP-UCB, which integrates a learned approximator that predicts post-serving contexts from pre-serving information....

arXiv CS 8d ago

Observer-Based Control of Linear Systems with Mismatched Input and Output Delays

arXiv:2606.03081v1 Announce Type: new Abstract: This paper investigates the stabilization of linear systems subject to simultaneous, mismatched time delays in both the control input and system output vectors. The proposed control framework is developed in two primary stages. First, an asymptotically stabilizing delayed state-feedback controller is synthesized by leveraging recent advancements in Linear Matrix Inequality (LMI) techniques.

arXiv CS 7d ago

ViVa: A Video-Generative Value Model for Robot Reinforcement Learning

arXiv:2604.08168v2 Announce Type: replace Abstract: Vision-language-action (VLA) models have advanced robot manipulation through large-scale pretraining, but real-world deployment remains challenging due to partial observability and delayed feedback. Reinforcement learning addresses this via value functions, which assess task progress and guide policy improvement. However, existing value models built on vision-language models (VLMs) struggle to capture temporal dynamics and physical...

arXiv CS 2d ago

Tree-of-Experience: A Structured Experience-Management Solution for Self-Evolving Agents under Low-Repetition and Implicit-Reward Environments

arXiv:2606.06960v1 Announce Type: new Abstract: Experience-based self-evolution is crucial for LLM agents, but existing benchmarks often assume explicit goals, stable task patterns, and clear feedback. We study a more challenging setting: low-repetition tasks with implicit rewards, where past experience is difficult to reuse and feedback is delayed, noisy, and outcome-level. We introduce \textsc{FinEvolveBench}, a temporally controlled benchmark for financial sentiment prediction that links...

arXiv CS 2d ago

Beyond Tokens: Enhancing RTL Quality Estimation via Structural Graph Learning

arXiv:2508.18730v2 Announce Type: replace Abstract: Estimating the quality of register transfer level (RTL) designs is crucial in the electronic design automation (EDA) workflow, as it enables instant feedback on key performance metrics like area and delay without the need for time-consuming logic synthesis. While recent approaches have leveraged large language models (LLMs) to derive embeddings from RTL code and achieved promising results, they overlook the structural semantics essential...

arXiv CS 9d ago

Engagement Process: Rethinking the Temporal Interface of Action and Observation

Announce Type: replace Abstract: Task completion in digital and physical environments increasingly involves complex temporal interaction, where actions and observations unfold over different time scales rather than align with fixed observation--action steps. To model such interactions, we propose \emph{Engagement Process} (EP), an interaction formalism that inherits the decision-theoretic structure of POMDPs while making time explicit in the action--observation interface. EP represents...

arXiv CS 1d ago

Co-Evolving Skill Generation and Policy Optimization

Announce Type: new Abstract: Skill-augmented reinforcement learning improves language agents by storing reusable procedural knowledge acquired from past experience. Existing methods typically use strong language models to analyze trajectories, generate skills, and update a retrievable skill bank during online training. However, they rarely assess whether a newly generated skill is useful before it is stored and reused.

arXiv CS 1d ago

Symphony-Coord: Adaptive Routing for Multi-Agent LLM Systems

arXiv:2602.00966v2 Announce Type: replace Abstract: Multi-agent large language model systems can tackle complex multi-step tasks by decomposing work and coordinating specialized behaviors. However, current coordination mechanisms typically rely on statically assigned roles and centralized controllers. As agent pools and task distributions evolve, these design choices can lead to inefficient routing, poor adaptability, and fragile fault recovery.

arXiv CS 8d ago

Popularity Feedback Constrains Innovation in Cultural Markets

arXiv:2602.09997v2 Announce Type: replace Abstract: Real-world creative processes ranging from art to science rely on social feedback-loops between selection and creation. Yet, the effects of popularity feedback on collective creativity remain poorly understood. We investigate how popularity ratings influence cultural dynamics in a large-scale online experiment where participants ($N = 1\,008$) iteratively \textit{select} images from evolving markets and \textit{produce} their own modifications.

arXiv CS 2d ago