Home › Knowledge Base › Semantic Router

Semantic Router

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

arXiv:2603.04444v4 Announce Type: replace Abstract: As large language models (LLMs) diversify across modalities, capabilities, and cost profiles, the problem of intelligent request routing: selecting the right model for each query at inference time, has become a critical systems challenge. We present vLLM Semantic Router, a signal-driven decision routing framework for Mixture-of-Modality (MoM) model deployments. The architecture follows two complementary Shannon-inspired views.

arXiv CS 6d ago

vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

arXiv:2603.04444v3 Announce Type: replace Abstract: As large language models (LLMs) diversify across modalities, capabilities, and cost profiles, the problem of intelligent request routing -- selecting the right model for each query at inference time -- has become a critical systems challenge. We present vLLM Semantic Router, a signal-driven decision routing framework for Mixture-of-Modality (MoM) model deployments. The central innovation is composable signal orchestration: the system...

arXiv CS 7d ago

Microskill Architecture: A Modular Skill-Driven Framework for AI-Native Code Generation

Announce Type: new Abstract: Large language models and AI coding agents have reshaped software development, but the path to fully AI-native systems faces structural challenges. Chief among them is managing context windows without losing accuracy or efficiency. When developers inject full project documentation and code into a model's memory, the model loses mid-sequence information, token costs spiral, and architecture drifts.

arXiv CS 5d ago

Schedule-Level Shared-Prefix Reuse for LLM RL Training

Announce Type: replace Abstract: GRPO- and PPO-style LLM post-training commonly sample multiple trajectories from the same prompt and then train on the resulting group. In long-context RL workloads, this shared prompt-side prefix can contain retrieved passages, visual tokens, tool schemas, system instructions, or task context, while the full rollout group is still too large to pack into one training microbatch. Standard dense trainers therefore recompute the same prefix forward and backward...

arXiv CS 7d ago

Schedule-Level Shared-Prefix Reuse for LLM RL Training

Announce Type: replace Abstract: GRPO-based LLM post-training commonly samples multiple trajectories from the same prompt and then trains on the resulting group. In long-context GRPO workloads, this shared prompt-side prefix can contain retrieved passages, visual tokens, tool schemas, system instructions, or task context, while the full rollout group is still too large to pack into one training microbatch. Standard dense trainers therefore recompute the same prefix forward and backward for...

arXiv CS 6d ago

Schedule-Level Shared-Prefix Reuse for LLM RL Training

arXiv:2606.01143v1 Announce Type: new Abstract: GRPO- and PPO-style LLM post-training commonly sample multiple trajectories from the same prompt and then train on the resulting group. In long-context RL workloads, this shared prompt-side prefix can contain retrieved passages, visual tokens, tool schemas, system instructions, or task context, while the full rollout group is still too large to pack into one training microbatch. Standard dense trainers therefore recompute the same prefix...

arXiv CS 8d ago

memorywire: A Vendor-Neutral Wire Format for Agent Memory Operations

arXiv:2606.01138v2 Announce Type: replace Abstract: Agent-memory frameworks -- mem0, Letta/MemGPT, Cognee, Zep/Graphiti, MemoryOS, MemTensor -- each ship their own SDK, storage layout, and operational vocabulary. There is no shared wire format: every integration is bespoke, every migration rebuilds memory from scratch, and no framework ships a governance surface that lets a human review writes before they enter long-term storage. We present memorywire, a JSON-Schema 2020-12 wire format for...

arXiv CS 6d ago

AMP: A Vendor-Neutral Wire Format for Agent Memory Operations

arXiv:2606.01138v1 Announce Type: new Abstract: Agent-memory frameworks - mem0, Letta/MemGPT, Cognee, Zep/Graphiti, MemoryOS, MemTensor - each ship their own SDK, storage layout, and operational vocabulary. There is no shared wire format: every integration is bespoke, every migration rebuilds memory from scratch, and no framework ships a governance surface that lets a human review writes before they enter long-term storage. We present memorywire, a JSON-Schema 2020-12 wire format for five...

arXiv CS 8d ago

From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Data-Free Online Backdoor Defense

arXiv:2601.19448v2 Announce Type: replace Abstract: Deep Neural Networks remain inherently vulnerable to backdoor attacks. Traditional test-time defenses largely operate under the paradigm of internal diagnosis methods like model repairing or input robustness, yet these approaches are often fragile under advanced attacks as they remain entangled with the victim model's corrupted parameters. We propose a paradigm shift from Internal Diagnosis to External Semantic Auditing, arguing that...

arXiv CS 9d ago

MoG: Mixture of Experts for Graph-based Retrieval-Augmented Generation

arXiv:2605.31010v1 Announce Type: new Abstract: Retrieval-augmented generation is intensively studied to ground large language models on external evidence. However, retrieving from a unified knowledge base could inevitably introduce irrelevant information that may mislead generation for complex reasoning. Inspired by the conditional computation of mixture of experts (MoE), where a router sparsely selects specialized experts alongside shared ones for each input, we propose \textbf{M}ixture...

arXiv CS 9d ago