textual CoT
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning
arXiv:2602.07075v5 Announce Type: replace Abstract: Current chemical large language models (LLMs) predominantly rely on explicit Chain-of-Thought (CoT) to solve complex reasoning problems. However, forcing nonverbal tacit chemical logic into discrete natural language imposes a fundamental ``modality mismatch,'' creating an artificial bottleneck for reasoning. We introduce LatentChem, a reasoning interface that decouples chemical logic from linguistic generation, enabling the model to process...
LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning
arXiv:2602.07075v5 Announce Type: replace-cross Abstract: Current chemical large language models (LLMs) predominantly rely on explicit Chain-of-Thought (CoT) to solve complex reasoning problems. However, forcing nonverbal tacit chemical logic into discrete natural language imposes a fundamental ``modality mismatch,'' creating an artificial bottleneck for reasoning. We introduce LatentChem, a reasoning interface that decouples chemical logic from linguistic generation, enabling the model to...
VTI-CoT: Visual-Textual Interleaved Chain of Thought for Video Reasoning
Announce Type: new Abstract: Video reasoning aims to understand complex temporal events and causal relationships within videos. Recently, Chain-of-Thought (CoT) has been introduced to this field to enhance reasoning accuracy. However, existing CoT-based video reasoning methods primarily rely on text-only information for logical deduction, overlooking critical visual information during the inference process.
iVGR: Internalizing Visually Grounded Reasoning for MLLMs with Reinforcement Learning
arXiv:2605.31096v1 Announce Type: new Abstract: While visually grounded Chain-of-Thought (CoT) has emerged as a promising paradigm to enhance fine-grained perception in multimodal large language models (MLLMs), its efficacy during the inference phase remains underexplored. In this work, we empirically find that mandating explicit object boxes in visually grounded CoT during inference often degrades performance compared to standard textual CoT, which reasons without explicit visual grounding....
Latent Reasoning with Normalizing Flows
arXiv:2606.06447v1 Announce Type: new Abstract: Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbalized before the model can proceed, even when the underlying update is semantic, uncertain, or only partially formed. Latent reasoning offers a...
Why Thinking Hurts: Diagnosing and Rectifying Linguistic Inertia in Large Language Models for Recommendation
Announce Type: replace Abstract: Chain-of-Thought (CoT) reasoning is widely used to improve LLM performance, and recent foundation recommender models adopt it by generating textual reasoning before predicting target items represented by Semantic IDs (SIDs). However, we observe that enabling thinking mode in models such as OpenOneRec can degrade recommendation quality by up to 25%. We investigate this failure and identify Linguistic Inertia: when a textual CoT segment is inserted before SID...
TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
arXiv:2606.08464v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning has proven effective for enhancing problem-solving in large language models. However, when applied to multimodal LLMs (MLLMs), existing CoT approaches suffer from a fundamental limitation: they perform reasoning entirely in text without accessing visual features during the reasoning process. After initial visual encoding, image information becomes inaccessible, forcing models to reason based solely on whatever...
Attention-guided Fine-tuning of Multimodal Large Language Models Improves Chain-of-Thought Reasoning
arXiv:2606.01558v1 Announce Type: new Abstract: The effectiveness of Chain-of-Thought (CoT) prompting in Multimodal Large Language Models (MLLMs) remains uncertain: across several visual reasoning benchmarks, CoT prompting often degrades performance compared to direct prompting. In this paper, we provide a systematic analysis of CoT behavior in three modern MLLM families across model scales on datasets requiring step-wise visual evidence. Our analysis identifies two recurring failure modes:...
Adaptive Latent Agentic Reasoning
arXiv:2606.02871v1 Announce Type: new Abstract: Large reasoning models improve performance by generating extended chain-of-thought (CoT) reasoning, but this behavior becomes inefficient when applied to LLM agents. Current LLM agents often generate verbose textual reasoning at every decision step and allocate reasoning effort nearly uniformly across turns, leading to substantial inefficiency in multi-turn agentic trajectories. We propose Adaptive Latent Agentic Reasoning (ALAR), a dual-mode...
DyCo-RL: Dynamic Cross-Modal Coordination for Visual Reasoning
Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a leading paradigm for enhancing visual reasoning in Multimodal Large Language Models (MLLMs). However, existing RLVR methods optimize primarily for the reasoning outcome, fundamentally overlooking the fine-grained cross-modal coordination required during the generation process. Through token-level analyses and controlled interventions, we reveal that during Chain-of-Thought (CoT) reasoning,...