Home › Knowledge Base › Insertion Model

Insertion Model

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Insertion Based Sequence Generation with Learnable Order Dynamics

arXiv:2602.18695v2 Announce Type: replace Abstract: Existing insertion-based masked diffusion models that generate sequences by interleaving token insertion with unmasking use fixed schedules that are not dependent on the data. For structured sequences like graphs and molecules, learning data-dependent generation orders can improve generation quality by reducing uncertainty over the action space. We propose LoFlexMDM, an insertion-based masked diffusion model with learnable order dynamics...

arXiv CS 1d ago

Variational Learning for Insertion-based Generation

arXiv:2606.02133v1 Announce Type: new Abstract: Non-monotonic sequence generation methods, such as masked diffusion models, provide a flexible alternative to left-to-right autoregressive modeling by allowing tokens to be generated in non-fixed and prescribed orders. Despite their practical advantages, most existing non-monotonic models are order-agnostic and rely on a fixed-length grid, limiting their ability to support variable-length generation and adaptive insertion order. In this work,...

arXiv CS 8d ago

Beyond Visual Memory: Mechanistic Diagnostics of Latent Visual Reasoning

new Abstract: Recent latent visual reasoning methods achieve substantial gains by inserting continuous latent tokens into multimodal language models. These gains are commonly attributed to the tokens encoding visual evidence; recent analyses, however, reveal a paradox: the tokens are loosely tied to the image and contribute little to the answer. Critically, these analyses treat latent tokens as a single unit, obscuring the true source of the gains.

arXiv CS 8d ago

CoVEBench: Can Video Editing Models Handle Complex Instructions?

Announce Type: new Abstract: While recent text-guided video editing models excel at elementary tasks (e.g., style transfer, object insertion), real-world user requests are highly compositional. A single prompt often demands multiple coupled edits, such as modifying subjects, actions, and camera views, while strictly preserving unrelated spatiotemporal content. Existing benchmarks, heavily constrained by isolated edits and coarse global metrics, fail to diagnose how models handle such complex...

arXiv CS 1d ago

Detecting Temporally Localized Manipulations in Authentic Video Streams

arXiv:2606.07090v1 Announce Type: new Abstract: The rapid advancement of video editing and generative artificial intelligence technologies has made realistic video manipulation increasingly accessible. Although existing datasets have significantly advanced research in deepfake detection, object removal, and video inpainting, they do not adequately model scenarios in which a short manipulated segment is inserted into an otherwise authentic video and the original video continues afterward. In...

arXiv CS 2d ago

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

arXiv:2606.05753v1 Announce Type: new Abstract: Latent visual reasoning (LVR) inserts supervised latent tokens between perception and answer generation in vision-language models (VLMs). The field uses alignment between these latents and their visual targets, i.e., cosine similarity or mean squared error (MSE), as both the training loss and the quality metric, assuming that better alignment yields a better answer. We test this with a designed matrix of five LVR variants and find the...

arXiv CS 5d ago

FLAGG: Flexible Autoregressive Graph Generation

arXiv:2606.05067v1 Announce Type: new Abstract: The Deep Graph Generation's panorama spans two extremes: one-shot and sequential models. The former generates nodes and edges jointly, while the latter samples them autoregressively. Each method performs better in different graph domains depending on size and topology, but neither is applicable to all graph categories.

arXiv CS 6d ago

Chameleon: Style-Content Disentangled Framework for Cross-Domain Object Compositing

arXiv:2606.01079v1 Announce Type: new Abstract: Image compositing aims to seamlessly insert a foreground object into a background image, and recent advances in diffusion models have significantly enhanced the quality, especially when the foreground and background images come from the same domain (e.g., natural images). However, cross-domain compositing, where the foreground and background come from different domains, is relatively underexplored and remains challenging because the model must...

arXiv CS 8d ago

Why Thinking Hurts: Diagnosing and Rectifying Linguistic Inertia in Large Language Models for Recommendation

Announce Type: replace Abstract: Chain-of-Thought (CoT) reasoning is widely used to improve LLM performance, and recent foundation recommender models adopt it by generating textual reasoning before predicting target items represented by Semantic IDs (SIDs). However, we observe that enabling thinking mode in models such as OpenOneRec can degrade recommendation quality by up to 25%. We investigate this failure and identify Linguistic Inertia: when a textual CoT segment is inserted before SID...

arXiv CS 8d ago

Explicit Turn Resolution with Anisotropic Homogenisation for Efficient 3D Magneto-Thermal Finite-Element Simulation of Large-Scale No-Insulation HTS Magnets

Announce Type: cross Abstract: No-insulation (NI) and metal-insulation (MI) high-temperature superconducting (HTS) magnets require three-dimensional (3D) models to describe the current distribution around critical current defects. In this work, we design and validate the EXTRA homogenisation method, standing for explicit turn resolution with anisotropic homogenisation method. It allows 3D magneto-thermal finite-element (FE) simulations of large-scale magnets to be performed with high...

arXiv CS 9d ago