Home Knowledge Base Spatiotemporal Agent-

Spatiotemporal Agent-

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Preparing for the Next Carrington: Spatiotemporal Agent-Based Modeling for Safeguarding Satellite Infrastructure Under Extreme Space Weather Disturbances

Announce Type: new Abstract: Extreme space weather poses an existential threat to modern satellite infrastructure, with a Carrington-class solar storm projected to cause economic losses of billions of dollars per day. Due to the rapid proliferation of satellites (with over 70,000 expected to be deployed in the next 5 years), understanding extreme space weather impacts has become essential for global economic stability and national security, and consequently, the lives of millions. However,...

arXiv Physics 5d ago

Embody4D: A Generalist Data Engine for Embodied 4D World Modeling

arXiv:2605.01799v2 Announce Type: replace Abstract: Embodied agents require robust and comprehensive 3D spatiotemporal representations to support spatial reasoning, manipulation understanding, and downstream decision making. However, existing robot data are typically captured from fixed or sparse viewpoints, providing only partial and view-dependent observations, which limits multi-view perception and generalization across viewpoints. Given the difficulty of collecting additional viewpoints...

arXiv CS 1d ago

MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

arXiv:2606.07512v1 Announce Type: new Abstract: Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention dilution. To overcome this, we introduce MemDreamer to decouple perception and reasoning, shifting long-video understanding into an agentic exploration process. As a plug-and-play framework, it incrementally streams videos to construct a Hierarchical Graph Memory, a top-down three-tier...

arXiv CS 2d ago

TrafficClaw: A Generalizable LLM Agent in the Unified Physical Environment for Urban Traffic Control

Announce Type: replace Abstract: Large language model (LLM) agents have shown strong capabilities in long-horizon reasoning, tool use, and decision-making in digital environments, yet extending them to physically grounded systems remains challenging. Unlike web, code, or game environments, where objectives are often weakly coupled, physical systems evolve through tightly coupled dynamics in which local interventions propagate across interacting subsystems over time.

arXiv CS 8d ago

S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering

arXiv:2605.28831v2 Announce Type: replace Abstract: Long-horizon memory question answering often requires sparse evidence from heterogeneous histories, including events, object states, visual observations, temporal relations, and causal steps. Existing memory interfaces expand reader context, retrieve semantically related chunks, or expose graph neighborhoods, but they are not explicitly designed to select compact evidence for a fixed reader. We propose Structured Spatiotemporal Scene--Event...

arXiv CS 1d ago

DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories

arXiv:2602.10809v2 Announce Type: replace Abstract: Existing multimodal retrieval systems excel at semantic matching but implicitly assume that query-image relevance can be measured in isolation. This paradigm overlooks the rich dependencies inherent in realistic visual streams, where information is distributed across temporal sequences rather than confined to single snapshots. To bridge this gap, we introduce DeepImageSearch, a novel agentic paradigm that reformulates image retrieval as an...

arXiv CS 9d ago

3rd Place at CVPR 2026 CASTLE Challenge: Agentic Multi-View Long-Context Video Understanding via Hierarchical Knowledge Graph Retrieval

arXiv:2606.01933v1 Announce Type: new Abstract: This paper presents our winning methodology for the CASTLE 2026 Challenge at the CVPR 2026 EgoVis Workshop, where our team secured third place globally. The challenge tasks participants with answering highly complex visual, spatiotemporal, and verbal questions, including visual counting, action localization, multi-view tracking and speaker temporal reasoning, within massive, multimodal video streams. The underlying dataset consists of over 600...

arXiv CS 8d ago

Scalable Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantee: A Constrained Mean-Field Reinforcement Learning Approach

arXiv:2503.24183v3 Announce Type: replace Abstract: The expansion of ride-sourcing services such as Uber and Lyft has reshaped urban transportation by offering flexible, on-demand mobility via mobile applications. Despite convenience, these platforms confront significant operational challenges, particularly vehicle rebalancing-strategic repositioning of a fleet of vehicles to address spatiotemporal mismatches in supply and demand. Inadequate rebalancing results in prolonged rider waiting...

arXiv CS 8d ago

PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting

Announce Type: replace Abstract: Coupled spatiotemporal forecasting is important for predicting the future evolution of multiple interacting dynamical systems, such as in climate models. However, existing methods are severely constrained by the persistent bottleneck of compounding errors. In coupled systems, errors from each subsystem simulator propagate and amplify one another, a phenomenon we term Reciprocal Error Amplification, leading to a rapid collapse of long-range predictions.

arXiv CS 7d ago

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Announce Type: new Abstract: Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous egocentric streams, often using evidence outside the current view. Existing benchmarks either evaluate offline over full videos or target events rather than spatial structure. We introduce OVO-S-Bench, a fully human-annotated benchmark for streaming spatial intelligence, comprising 1,680 questions over 348 source videos.

arXiv CS 7d ago