Home › Knowledge Base › Qwen

Qwen

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

From fried chicken to flight plans: Alibaba wants Qwen to become China’s digital fixer

From fried chicken to flight plans: Alibaba wants Qwen to become China’s digital fixer As Tencent prepares a rival WeChat agent, Alibaba is moving quickly to turn Qwen from a chatbot into a digital concierge for everyday life The company announced on Wednesday that it was opening Qwen’s ecosystem to third-party partners’ agents. Fast-food giant KFC, tech-driven coffee chain Luckin Coffee, beverage chain Mixue Group and China Eastern Airlines are among the first companies to test their...

South China Morning Post 7d ago

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

arXiv:2605.30280v2 Announce Type: replace Abstract: Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks, environments, and robot embodiments. In this work, we study whether heterogeneous embodied decision-making problems can be unified within a single vision-language-action model. We present Qwen-VLA, a unified embodied foundation model that...

arXiv CS 8d ago

Qwen-Image-Flash: Beyond Objective Design

arXiv:2606.03746v1 Announce Type: new Abstract: Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image...

arXiv CS 7d ago

Qwen-Image-Flash: Beyond Objective Design

arXiv:2606.03746v2 Announce Type: replace Abstract: Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image...

arXiv CS 6d ago

Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents

arXiv:2606.08151v1 Announce Type: new Abstract: Tool-using LLM agents often fail not because relevant text is absent, but because decisive evidence is not selected, compressed, or surfaced at action time. We present CICL, a decision-aware context layer that turns instance evidence into a context graph, routes deterministic, Opus-assisted, Qwen, Codex/GPT-5.5, and Qwen-QLoRA judgments through a shared eight-field schema, scores units by action shift, outcome uplift, necessity, and...

arXiv CS 1d ago

Beyond Accuracy: Behavioral Dynamics of Agentic Multi-Hunk Repair

arXiv:2511.11012v2 Announce Type: replace Abstract: Automated program repair has traditionally focused on single-hunk defects, overlooking multi-hunk bugs that are prevalent in real-world systems. Repairing these bugs requires coordinated edits across multiple, disjoint code regions, posing substantially greater challenges. We present the first systematic study of LLM-driven coding agents (Claude Code, Codex, Gemini-cli, and Qwen Code) on this task.

arXiv CS 1d ago

From Hazard Functions to Language Space: Cox-Supervised Distillation of Survival Risk into a Large Language Model

new Abstract: We investigate whether information about time-to-event risk estimated by a Cox proportional hazards model can be transferred into a generative large language model. We propose a text-based survival modelling pipeline in which structured clinical covariates are converted into text prompts and a Qwen-based large language model is fine-tuned to generate patient-specific survival risk using Cox model predictions as a training target. Across GBSG2, ACTG320, and WHAS500, the model...

arXiv CS 1d ago

Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

Announce Type: replace Abstract: Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the model and, on each training step, updating only the small set of memory rows that the current batch reads most heavily. We re-implement SMF on Qwen-2.5-0.5B-Instruct and compare it with LoRA and full finetuning on MedMCQA,...

arXiv CS 1d ago

Textual Supervision Enhances Geospatial Representations in Vision-Language Models

Announce Type: new Abstract: Geospatial understanding is a critical yet underexplored dimension in the development of machine learning systems for tasks such as image geolocation and spatial reasoning. In this work, we analyze the geospatial representations acquired by three model families: vision-only architectures (e.g., ViT), vision-language models (e.g., CLIP), and large-scale multimodal foundation models (e.g., LLaVA, Qwen, and Gemma). By evaluating across image clusters, including...

arXiv CS 2d ago

OPRD: On-Policy Representation Distillation

arXiv:2606.06021v1 Announce Type: new Abstract: On-policy distillation (OPD) supervises the student only in output space by matching next-token probabilities. This output-only paradigm has two limits: (1) sampling variance from Monte Carlo KL estimates over large vocabularies (e.g., Qwen's ~150k tokens) persists throughout training, and (2) it treats the teacher as a black-box, discarding all intermediate hidden states after the LM head. We propose On-Policy Representation Distillation...

arXiv CS 5d ago