Unified Temporal Dynamics Modeling
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
TiWeaver: Unified Temporal Dynamics Modeling via Contextual Patching
arXiv:2606.03121v1 Announce Type: new Abstract: Multivariate time series forecasting plays a critical role in real-world applications, including weather prediction, stock analysis, and health monitoring. Due to the diversity of data sources, time series exhibit diverse temporal dynamics, often accompanied by various irregularities such as missing values and non-uniform sampling frequencies. Such irregularities lead to complex and asynchronous temporal dependencies across channels.
Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics
Announce Type: replace Abstract: We present Unified Latent Dynamics (ULD), a novel reinforcement learning algorithm that unifies the efficiency of model-free methods with the representational strengths of model-based approaches, without incurring planning overhead. By embedding state-action pairs into a latent space in which the true value function is approximately linear, our method supports a single set of hyperparameters across diverse domains -- from continuous control with...
Attend to Anything: Foundation Model for Unified Human Attention Modeling
arXiv:2606.03540v1 Announce Type: new Abstract: Existing human attention (saliency) modeling methods persist as highly fragmented across modalities, scenes, and task formulations. Consequently, even with increasing model capacity and data scale, current models predominantly remain scene-dependent and task-specific, failing to practically generalize in real-world applications. To address the fundamental limitations, we present the Attend to Anything Model (AAM), a multi-modal foundation model...
IMUG-Bench: Benchmarking Unified Multimodal Models on Interleaved Understanding and Generation
arXiv:2606.09169v1 Announce Type: new Abstract: In recent years, unified multimodal models (UMMs) have emerged to support both understanding and generation within a single framework. Mastering dynamic, multi-turn interleaved image-text dialogues is a crucial task for UMMs in real-world applications. However, existing benchmarks fail to evaluate this important task, as they are often limited to single-turn or static settings, and typically overlook exposure bias in multi-turn interactions.
EEGDancer: Dynamic Emotion Latent Space Masked Modeling with Reinforcement Learning for EEG Continuous Emotion Prediction
arXiv:2606.05855v1 Announce Type: new Abstract: Continuous electroencephalography (EEG) emotion prediction aims to model the temporal evolution of human emotional states from EEG signals. Unlike conventional discrete emotion recognition, continuous prediction requires capturing long-range temporal dependencies and coherent emotional dynamics.
Towards Consistent Video Geometry Estimation
arXiv:2605.30060v2 Announce Type: replace Abstract: This work presents ViGeo, a feed-forward foundation model for recovering spatially dense and temporally consistent geometry from video sequences. Built upon a plain transformer architecture without task-specific architectural modifications, ViGeo supports streaming, full-sequence, and long-video inference within a unified model. The key design is dynamic chunking attention, which exposes the model to both bidirectional and causal temporal...
Local Intrinsic Dimensionality of Ground Motion Data for Early Detection of Catastrophic Slope Failure
arXiv:2601.03569v3 Announce Type: replace Abstract: Local Intrinsic Dimensionality (LID) has shown strong potential for anomaly detection in high-dimensional data, including landslide failure detection in granular media, where early and accurate identification of failure zones is crucial for effective geohazard mitigation. However, this task is still challenging due to the spatial correlations and temporal dynamics that are inherently present in surface displacement data. To address this...
MORPHOS: Autoregressive 4D Generation with Temporal Structured Latents
arXiv:2606.02491v1 Announce Type: new Abstract: We present MORPHOS, a novel autoregressive framework that generates dynamic 3D assets from videos across diverse representations, including meshes, 3D Gaussians, and radiance fields. Existing methods are typically limited to a single representation, struggle to model topological changes, or fail to maintain temporal consistency over long videos. To address these limitations, we introduce the Temporal Structured Latents (T-SLAT), a unified 4D...
Audio-Visual World Models: Grounding Multisensory Imagination for Embodied Agents
arXiv:2512.00883v3 Announce Type: replace Abstract: World models simulate environmental dynamics to enable agents to plan and reason about future states. While existing approaches have primarily focused on visual observations, real-world perception inherently involves multiple sensory modalities.
EgoAction: Egocentric Action Composition with Reliability-Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026
Announce Type: replace Abstract: The EPIC-KITCHENS-100 Action Detection challenge evaluates whether a model can localize the start and end of each action in long untrimmed egocentric videos and assign the corresponding verb--noun action label. In this report, we formulate our submission as EgoAction (Egocentric Action Composition with Reliability-Aware Temporal Fusion), a unified decoupled detection and fusion pipeline. The pipeline uses EPIC-finetuned VideoMAE-L features, trains separate...