Home › Knowledge Base › Unified

Unified

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

A Unified Heterogeneous Implementation of Numerical Atomic Orbitals-Based Real-Time TDDFT within the ABACUS Package

Announce Type: replace-cross Abstract: We present a unified heterogeneous computing framework for real-time time-dependent density functional theory (RT-TDDFT) based on numerical atomic orbitals (NAOs), implemented in the ABACUS package. We introduce three co-designed abstraction layers, including unified data containers, unified linear algebra operators, and unified grid integration interfaces. These layers collectively accelerate the two most demanding parts of NAO-based RT-TDDFT: explicit...

arXiv Physics 6d ago

Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

arXiv:2605.31603v1 Announce Type: new Abstract: Connector-based video unified models have demonstrated strong capability in instruction-grounded video synthesis, but integrating a large high-fidelity generator into the unified training loop is computationally prohibitive, limiting achievable visual quality. We therefore propose Lumos-Nexus, a training-efficient unified video generation framework that facilitates the development of strong reasoning-driven generation capabilities while...

arXiv CS 9d ago

UniCanvas: A Diffusion-base Unified Model for Text-in-Image Joint Generation

Announce Type: new Abstract: Recent years have seen remarkable progress in unified vision-language models handling both multimodal understanding and generation within a single architecture. While autoregressive VLMs can reason across modalities, they fail to generate high-quality images. In contrast, diffusion models produce photorealistic visuals yet struggle to generate coherent text, making it challenging to develop a single unified model that can seamlessly handle both visual and text...

arXiv CS 6d ago

On the Limits of Token Reduction for Efficient Unified Vision Language Training

arXiv:2606.01503v1 Announce Type: new Abstract: Unified vision-language models (VLMs) integrate visual understanding and visual generation within a single autoregressive backbone, but their joint training is computationally expensive and largely overlooked from an efficiency perspective. In this work, we study the feasibility and limits of token-reduction-based acceleration for unified VLM training. Through a systematic analysis of layerwise attention allocation, we uncover a fundamental...

arXiv CS 8d ago

LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing

new Abstract: Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs is a promising yet challenging frontier field. Existing unified frameworks predominantly rely on massive models (typically 13B parameters or more) and incorporate source video conditions for editing by concatenating sequence tokens. This concatenation inevitably doubles the sequence length, quadrupling the computational complexity of the self-attention mechanism and...

arXiv CS 5d ago

LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing

Announce Type: replace Abstract: Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs is a promising yet challenging frontier field. Existing unified frameworks predominantly rely on massive models (typically 13B parameters or more) and incorporate source video conditions for editing by concatenating sequence tokens. This concatenation inevitably doubles the sequence length, quadrupling the computational complexity of the self-attention...

arXiv CS 2d ago

What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression

arXiv:2606.01292v1 Announce Type: new Abstract: Teacher-Student Knowledge Transfer (KT) is ubiquitous in modern machine learning, ranging from classical model compression via Knowledge Distillation (KD) to the emergent phenomenon of Weak-to-Strong (W2S) generalization. While existing studies offer isolated insights, a unified theoretical framework explaining the efficacy of KT across these disparate regimes remains lacking. In this work, we establish a unified spectral analysis of SGD...

arXiv CS 8d ago

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

arXiv:2605.30280v2 Announce Type: replace Abstract: Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks, environments, and robot embodiments. In this work, we study whether heterogeneous embodied decision-making problems can be unified within a single vision-language-action model. We present Qwen-VLA, a unified embodied foundation model that...

arXiv CS 8d ago

UniMedVL: Unifying Medical Multimodal Understanding and Generation through Observation-Knowledge-Analysis

Announce Type: replace Abstract: Medical workflows routinely combine reading images with producing visual and textual outputs, making both image understanding and generation central to medical AI. Most existing systems, however, address these abilities in isolated models, losing the shared knowledge that a unified architecture could exploit. To bridge this gap, we present UniMedVL, the first unified medical model that seamlessly integrates multimodal understanding and generation capabilities...

arXiv CS 9d ago

TIDE: Task-Isolated Diffusion for Unified Video Editing and Generation

new Abstract: Recent advances in Diffusion Transformers have driven rapid progress in video generation and editing, yet these capabilities are still handled by separate, task-specific models. Building a unified framework that supports diverse video tasks remains an open challenge: existing unified attempts either require dedicated auxiliary encoders or lack explicit mechanisms to distinguish heterogeneous conditioning tokens, struggling when the number and type of visual conditions vary...

arXiv CS 1d ago