Home Knowledge Base \emph{object

\emph{object

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

DisFlow: Scene Flow from Distance Field for Object Pose, Velocity Tracking, and Dynamic Object Reconstruction

Announce Type: new Abstract: We present \emph{DisFlow}, a novel framework for online scene flow estimation from distance field that enables \emph{6DoF dynamic object pose estimation}, \emph{motion tracking}, and \emph{surface reconstruction}. The scene is represented by Gaussian Process Implicit Surfaces (GPIS), with surface normals serving as derivative constraints, enabling accurate signed distance computations near the surface and gradient queries with uncertainty. With this...

arXiv CS 8d ago

Structural Decoupling: A Scaffold-Flow Theory of Generalization and Alignment

arXiv:2506.20699v2 Announce Type: replace Abstract: Learning in non-stationary and multi-context environments requires more than ordinary within-task generalization. A system must also discover which contexts exist, route inputs to the correct context, preserve old contexts, and revise the context library when the environment changes. This paper presents Structural Learning Theory (StrLT) as a framework of filling this missing structural gap.

arXiv CS 1d ago

Unlocking Proactivity in Task-Oriented Dialogue

arXiv:2605.22240v2 Announce Type: replace Abstract: Proactive task-oriented dialogue (TOD), such as outbound sales, demands a persuasive agent that actively probes the user's concerns and steers the conversation toward acceptance within a bounded number of turns. Yet post-trained LLMs are inherently conservative, and reward-shaping RL (e.g., GRPO) struggles since it only re-weights what an already passive policy samples. We show that conditioning on the user's latent concerns unlocks...

arXiv CS 6d ago

A Unifying Lens on Reward Uncertainty in RLHF

arXiv:2606.09073v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) is bottlenecked by \emph{reward hacking}, where the policy exploits errors in a proxy reward model (RM) and produces high RM scores without genuine quality gains. A natural mitigation is \emph{pessimism}: penalizing rewards in regions where the RM is uncertain. However, standard scalar RMs provide no principled notion of uncertainty.

arXiv CS 1d ago

Curvature-Guided LoRA: Matching Full Fine-Tuning in Function Space

arXiv:2603.29824v2 Announce Type: replace Abstract: Parameter-efficient fine-tuning methods such as LoRA enable efficient adaptation of large pretrained models, but often lag behind full fine-tuning in both convergence speed and final performance. Recent approaches aim to reduce this gap by aligning LoRA parameter updates with those of full fine-tuning, but such parameter-space alignment only indirectly controls model predictions. Instead, we adopt a function-space perspective and formulate...

arXiv CS 1d ago

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models

arXiv:2510.11683v3 Announce Type: replace Abstract: A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) is the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation during training. While existing methods approximate the log-likelihoods by their evidence lower bounds (ELBOs) via customized Monte Carlo (MC) sampling, they incur significant memory overhead due to the need...

arXiv CS 9d ago

Active Exploring like a Pigeon: Reinforcing Spatial Reasoning via Agentic Vision-Language Models

Announce Type: new Abstract: Enabling Vision-Language Models (VLMs) to perform spatial reasoning remains challenging. Existing approaches treat VLMs as passive observers, which is difficult for real-world applications. Moreover, reinforcement learning methods rely on sparse rewards, limiting their effectiveness for complex reasoning tasks.

arXiv CS 8d ago

Outsmarting the Chameleon: Counterfactual Decoupling for Tactical OOD Shifts in Live Streaming Risk Assessment

arXiv:2606.02946v1 Announce Type: new Abstract: Live streaming has emerged as a primary medium for social interaction and digital commerce, yet it is increasingly plagued by sophisticated risks. A fundamental challenge in this domain is \emph{tactical out-of-distribution (OOD) shift}: while malicious actors maintain stable underlying objectives, they continuously redesign narrative packaging to evade detection. Such adversarial shifts expose critical limitations of existing OOD...

arXiv CS 7d ago

Reinforcement Learning for Flow-Matching Policies with Density Transport

Announce Type: new Abstract: We present an online reinforcement learning (RL) algorithm for fine-tuning flow-matching policies in continuous-control problems. Our key insight is to view RL-based policy improvement as a transport of action densities towards regions of high reward, which naturally aligns with the transport formulation of flow matching models. Prior methods either approximate the current or optimal policy distribution or resort to distillation, which introduces biased gradients...

arXiv CS 1d ago

jXBW: A Compressed Index for Structure-Aware JSONL Retrieval in Structured RAG

arXiv:2508.12536v3 Announce Type: replace Abstract: Providing \textit{structured} information to large language models (LLMs) improves multi-step reasoning and factual grounding, and recent retrieval-augmented generation (RAG) systems therefore reconstruct structure from retrieved text on every query. When the corpus is \emph{already} structured -- as in JSON Lines (JSONL), a popular format for LLM prompts, chemical compounds, and geospatial records -- this per-query rebuilding can be...

arXiv CS 1d ago