Faithful Text
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Unified Controllable and Faithful Text-to-CAD Generation with LLMs
Computer Science > Computation and Language [Submitted on 27 Mar 2026] Title:PR-CAD: Progressive Refinement for Unified Controllable and Faithful Text-to-CAD Generation with Large Language Models View PDF HTML (experimental)Abstract:The construction of CAD models has traditionally relied on labor-intensive manual operations and specialized expertise. Recent advances in large language models (LLMs) have inspired research into text-to-CAD generation.
TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards
arXiv:2605.19320v2 Announce Type: replace Abstract: Faithful text rendering remains a persistent weakness of large text-to-image generative models, as it requires both semantic instruction following and fine-grained glyph-level structure. Prior methods often improve this ability through architecture-specific modules or encoder modifications, which complicate deployment across foundation models. We study text rendering as a post-training preference-alignment problem and propose TextAlign, a...
Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation
Announce Type: new Abstract: Identity-preserving video generation (IPVG) aims to synthesize high-fidelity videos that follow text prompts while faithfully preserving a reference identity. Despite recent progress, existing IPVG methods still struggle to balance high-level semantic control and low-level identity fidelity. To bridge this gap, we propose ST-DRC, an effective Spatial-Temporal Decoupled Reference Conditioning framework for identity-preserving text-to-video generation.
DeliCIR: Deliberative Test-Time Evolutionary Hierarchical Multi-Agents for Composed Image Retrieval
arXiv:2605.22478v3 Announce Type: replace Abstract: Composed Image Retrieval (CIR) requires both preserving the visual continuity of the reference image and faithfully executing the semantic variables specified in the modification text, which constitute the core challenge of the task. Existing methods often suffer from Perception Myopia in a single space, or fall into Logic Drift in iterative collaboration due to the perception ceiling of the underlying retriever. To address this issue, we...
Reward-Decomposed Reinforcement Learning for Immersive Video Role-Playing
arXiv:2605.04733v2 Announce Type: replace Abstract: Text-based role-playing models can imitate character styles, but often fail to capture scene atmosphere and evolving tension, which are crucial for immersive applications such as VR games and interactive narratives. We study video-grounded role-playing dialogue and introduce EBM-RL (Eye--Brain--Mouth Reinforcement Learning), a decoupled GRPO-based framework that separates observation (), reasoning (), and utterance generation (). This...
Evaluating Reasoning Fidelity in Visual Text Generation
Announce Type: new Abstract: Recent text-to-image (T2I) models can render highly legible and well-structured text within images, enabling applications including document generation and slide generation. However, it remains unclear whether such systems faithfully preserve reasoning ability when complex solutions must be expressed directly through rendered text, or whether they merely imitate surface-level patterns.
Dennis Quaid ditched LA for Nashville after the once 'fantastic' city went 'downhill'
Dennis Quaid is opening up about why he left Los Angeles, saying the once "fantastic" city has been on a downward slide for years. The 72-year-old actor joined the growing exodus of residents fleeing LA when he moved to Nashville, Tennessee with his wife Lauren Savoie, 33, in 2020.During an interview with Fox News Digital, "The Parent Trap" star, who lived in LA for decades, explained how he became frustrated and disillusioned with the management of the city, a sentiment that he believes is...
MAVEN A Multi-Agent Framework for Multicultural Text-to-Video Generation
arXiv:2605.16716v3 Announce Type: replace Abstract: Text-to-video (T2V) generation has rapidly progressed in visual fidelity, yet its ability to faithfully represent multiple cultures within a single prompt remains underexplored. We introduce MAVEN, a multi-agent prompt refinement framework designed to improve cultural fidelity in both mono-cultural and cross-cultural T2V generation. MAVEN decomposes prompts into person, action, and location dimensions, handled by specialized agents...
MAVEN A Multi-Agent Framework for Multicultural Text-to-Video Generation
arXiv:2605.16716v4 Announce Type: replace Abstract: Text-to-video (T2V) generation has rapidly progressed in visual fidelity, yet its ability to faithfully represent multiple cultures within a single prompt remains underexplored. We introduce MAVEN, a multi-agent prompt refinement framework designed to improve cultural fidelity in both mono-cultural and cross-cultural T2V generation. MAVEN decomposes prompts into person, action, and location dimensions, handled by specialized agents...
Autonomous heterogeneous catalyst discovery with a self-evolving multi-agent digital twin
arXiv:2606.05050v1 Announce Type: cross Abstract: Theoretical heterogeneous catalysis promises rapid catalyst discovery, yet computational and machine-learning predictions often deviate from experiment and stay confined to narrow material families, for want of a faithful, condition-aware catalytic simulator. We present CatDT (Catalysis Digital Twin), a self-evolving multi-agent system that builds an autonomous digital twin of a working catalyst, unifying gas-solid and liquid-solid modeling....