Home Knowledge Base Qwen2

Qwen2

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

TurtleAI: Benchmarking Multimodal Models for Visual Programming in Turtle Graphics

arXiv:2606.03626v1 Announce Type: new Abstract: Vision-language models (VLMs) have been explored for visual programming, where they generate code to solve visual tasks. However, most prior work focuses on visual programming for productivity; it remains unclear how well current VLMs perform on education-oriented visual programming and what factors limit their performance. To bridge this gap, we introduce TurtleAI, a benchmark containing 823 tasks curated based on real-world visual programming...

arXiv CS 7d ago

Reconstructing Content via Collaborative Attention to Improve Multimodal Embedding Quality

arXiv:2603.01471v2 Announce Type: replace Abstract: Multimodal embedding models, rooted in multimodal large language models (MLLMs), have yielded significant performance improvements across diverse tasks such as retrieval and classification. However, most existing approaches rely heavily on large-scale contrastive learning, with limited exploration of how the architectural and training paradigms of MLLMs affect embedding quality. While effective for generation, the causal attention and...

arXiv CS 8d ago

Reconstructing Content with Collaborative Attention for Universal Multimodal Representation Learning

arXiv:2603.01471v3 Announce Type: replace Abstract: Multimodal embedding models, rooted in multimodal large language models (MLLMs), have yielded significant performance improvements across diverse tasks such as retrieval and classification. However, most existing approaches rely heavily on large-scale contrastive learning, with limited exploration of how the architectural and training paradigms of MLLMs affect embedding quality. While effective for generation, the causal attention and...

arXiv CS 7d ago

Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning

arXiv:2507.12612v3 Announce Type: replace Abstract: Supervised fine-tuning performance for large language models depends strongly on how training budget is distributed across a heterogeneous set of tasks. In practice, mixtures are often fixed using simple heuristics (e.g., uniform or size-proportional sampling) that ignore task interactions, which can hurt transfer and waste budget on redundant sources. We introduce TaskPGM, a framework for learning continuous task mixtures via an...

arXiv CS 5d ago

How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing

arXiv:2603.13259v2 Announce Type: replace Abstract: When a decoder-only transformer is forced to process matched correct and incorrect single-token continuations of a factual query, the two pathways through hidden-state space diverge in a specific way: displacement vectors from the query-only representation maintain approximately equal magnitude but rotate apart in direction. The angular separation grows through mid-depth, and late layers resolve the asymmetric outcome -a logit-lens...

arXiv CS 1d ago

Learning Task Mixtures from Task Affinities: A Probabilistic Graphical Model for Supervised Fine-Tuning

arXiv:2507.12612v4 Announce Type: replace Abstract: Supervised fine-tuning performance for large language models depends strongly on how training budget is distributed across a heterogeneous set of tasks. In practice, mixtures are often fixed using simple heuristics (e.g., uniform or size-proportional sampling) that ignore task interactions, which can hurt transfer and waste budget on redundant sources. We introduce TaskPGM, a framework for learning continuous task mixtures via an...

arXiv CS 1d ago

Ask4VG: Risk-Aware Question Selection for Reducing Prior-Driven Answers in Medical VQA

Announce Type: new Abstract: Medical visual question answering requires models to ground their responses in image evidence, because visually unsupported answers can mislead downstream interpretation. However, many medical VQA questions are generic, template-like, or highly similar in form, which can encourage models to learn question-answer shortcuts instead of image-dependent reasoning and thereby increase the risk of hallucinated responses. We propose Ask4VG, a label-free pilot framework...

arXiv CS 8d ago

Overview of the EReL@MIR 2025 Multimodal Document Retrieval Challenge (Track 1)

Announce Type: new Abstract: Retrieval over visually-rich documents, pages that interleave text with figures, tables, and charts, is essential for multimodal retrieval-augmented generation, yet most retrievers still discard the visual channel. The \emph{Multimodal Document Retrieval Challenge}, Track~1 of the MIR Challenge at the first EReL@MIR workshop, co-located with The Web Conference 2025, asks participants to build a \emph{single} retrieval system that handles two complementary...

arXiv CS 6d ago