Disentangled ENvironments
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
GARDEN: Gravity-Aligned Reconstruction of Disentangled ENvironments from RGB images
arXiv:2606.03921v1 Announce Type: new Abstract: Converting multi-view RGB observations into simulation-ready 3D environments remains challenging because current reconstruction pipelines produce monolithic scene representations without explicit physical structure. They are typically defined up to an arbitrary global rotation and entangle rigid foreground objects with background geometry, which hinders stable physical interaction. Existing solutions often recover interactivity by replacing...
TinyGiantALM: A Compact Audio-Language Model for Intent-Aware Reasoning under Resource Constraints
Announce Type: new Abstract: Current advancements in Audio Reasoning rely on massive Large Audio-Language Models (LALMs), hindering deployment in resource-constrained environments. We introduce TinyGiantALM, a compact 1.5B efficiency-oriented alternative. Instead of brute-force scaling, we propose an Instruction-Aware Feature Refinement framework using a Query-guided Projector and Semantic Gating to filter acoustic signals based on user intent.
Disentangling conviction and conformity: a Bayesian ideal point model of voting behaviour in online debates
arXiv:2606.03786v1 Announce Type: new Abstract: Online debate platforms offer a unique window into the mechanisms driving opinion formation: they capture both explicit political preferences and the peer environment in which those preferences are expressed. In this work, I develop a Bayesian logistic regression model, inspired by ideal point models from political science, to disentangle two competing mechanisms of voting behaviour in online debates: conviction, driven by prior ideological...
SUSD: Structured Unsupervised Skill Discovery through State Factorization
Announce Type: replace Abstract: Unsupervised Skill Discovery (USD) aims to autonomously learn a diverse set of skills without relying on extrinsic rewards. One of the most common USD approaches is to maximize the Mutual Information (MI) between skill latent variables and states. However, MI-based methods tend to favor simple, static skills due to their invariance properties, limiting the discovery of dynamic, task-relevant behaviors.
What Makes Interaction Trajectories Effective for Training Terminal Agents?
arXiv:2606.03461v1 Announce Type: new Abstract: Stronger code agents are commonly assumed to be superior teachers for post-training, yet this assumption remains poorly disentangled from task difficulty, harness design, and student capacity. We investigate this pedagogical link using Terminal-Lego, a scalable pipeline that transforms multi-domain real-world issues into environment-verified agentic tasks. Surprisingly, standalone performance does not dictate teaching efficacy: while Claude...
Illumination-Invariant Anomaly Detection for Sub-Canopy UAV Multispectral Point Clouds
Announce Type: new Abstract: Unmanned Aerial Vehicle (UAV) multispectral point clouds (MPC) provide high-dimensional spatial-spectral data for sub-canopy target detection; however, their efficacy is significantly compromised by severe illumination heterogeneity caused by vegetation shadows. To address this, we propose a prior-free anomaly detection framework capable of robustly handling lighting variations. First, we formulate solar angle estimation as an inverse optimization problem.
DECKER: Domain-invariant Embedding for Cross-Keyboard Extraction and Recognition
Announce Type: replace Abstract: Acoustic side-channel attacks (ASCA) on keyboards pose a significant security risk, as keystrokes can be inferred from typing acoustics, revealing sensitive information. Prior ASCA studies are limited by small-scale datasets with restricted diversity in users, keyboards, and environments, constraining analysis across devices, microphones, and noise conditions. We introduce HEAR, a dataset designed to study ASCA along three axes: keyboard generalization, noise...
Autonomous FPV Flight with Translational Optical Flow and Uncertainty Mask
arXiv:2606.09088v1 Announce Type: new Abstract: Autonomous FPV quadrotor flight in complex environments using a monocular RGB camera as the sole exteroceptive sensor remains a fundamental challenge. Recent research has shown that using optical flow as the input of a neural network can achieve end-to-end autonomous flight in cluttered scenes. However, extracting the most relevant information from the flow estimation is the key bottleneck limiting agility and robustness.
MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents
arXiv:2606.03203v1 Announce Type: new Abstract: Computer-use agents could automate repetitive screen-based clinical work, but their reliability in medical graphical user interfaces remains largely unvalidated. Existing benchmarks focus on general web or desktop tasks and underrepresent medical software, which requires domain knowledge, exhibits markedly different UI design from mainstream applications, lacks public testing environments, and demands safety validation beyond task completion....
CityRAG: Stepping Into a City via Spatially-Grounded Video Generation
Announce Type: replace Abstract: We address the problem of generating a 3D-consistent, navigable environment that is spatially grounded: a simulation of a real location. Existing video generative models can produce a plausible sequence that is consistent with a text (T2V) or image (I2V) prompt. However, the capability to reconstruct the real world under arbitrary weather conditions and dynamic object configurations is essential for downstream applications including autonomous driving and...