Home Knowledge Base FFN

FFN

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

FocusDiT: Masking Queries in Diffusion Transformers for Fine-grained Image Generation

arXiv:2606.02090v1 Announce Type: new Abstract: Diffusion transformer (DiT) has been widely adopted in the generative diffusion field, advancing the denoising of query tokens through attention and Feed-Forward (\text{FFN}) layers. FFN actually acts as the key-value vocabulary for decoding visual contents where the value embeds the visual semantical knowledge. We present that focusing on critical query tokens corresponding to more complex details and encouraging the model to improve these...

arXiv CS 8d ago

FocusDiT: Masking Queries in Diffusion Transformers for Fine-grained Image Generation

arXiv:2606.02090v2 Announce Type: replace Abstract: Diffusion transformer (DiT) has been widely adopted in the generative diffusion field, advancing the denoising of query tokens through attention and Feed-Forward (\text{FFN}) layers. FFN actually acts as the key-value vocabulary for decoding visual contents where the value embeds the visual semantical knowledge. We present that focusing on critical query tokens corresponding to more complex details and encouraging the model to improve these...

arXiv CS 7d ago

Less is MoE: Trimming Experts in Domain-Specialist Language Models

arXiv:2606.05538v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models achieve strong performance through conditional computation, but their large parameter footprint poses deployment challenges. Prior MoE compression approaches catastrophically fail when evaluated on general-purpose benchmarks beyond commonsense reasoning.

arXiv CS 5d ago

UAOR: Uncertainty-aware Observation Reinjection for Vision-Language-Action Models

arXiv:2602.18020v2 Announce Type: replace Abstract: Vision-Language-Action (VLA) models leverage pretrained Vision-Language Models (VLMs) as backbones to map images and instructions to actions, demonstrating remarkable potential for generalizable robotic manipulation. To enhance performance, existing methods often incorporate extra observation cues (e.g., depth maps, point clouds) or auxiliary modules (e.g., object detectors, encoders) to enable more precise and reliable task execution, yet...

arXiv CS 1d ago

Inheritance Between Feedforward and Convolutional Networks via Model Projection

arXiv:2602.06245v2 Announce Type: replace-cross Abstract: Neural-network techniques are often transferred across architecture families by analogy, but such transfer is valid only when the assumptions required by a technique are preserved. We introduce this idea as inheritance between model classes. Using a unified node-level framework with tensor-valued activations, we prove that generalized feedforward networks (GFFNs) form a strict subset of generalized convolutional networks (GCNNs), so...

arXiv CS 2d ago

KV-Control: Parameter-Efficient K/V Injection for Trajectory-Controlled Text-to-Motion

arXiv:2606.05624v1 Announce Type: new Abstract: Text-conditioned 3D human motion models now synthesize plausible motions from prompts, but practical animation and embodied-agent workflows rarely stop at text: a character may need to follow a sketched root path, hit an end-effector target, or satisfy a multi-joint trajectory while still preserving the gait, style, and intent described by language. This exposes a control trade-off. A trajectory controller should be precise without overwriting...

arXiv CS 5d ago

LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling

arXiv:2606.04438v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) and looped architectures scale models along two orthogonal axes, namely parameter capacity and effective depth. However, mainstream looped architectures rely on dense backbones that couple parameter count with per-token FLOPs, which makes it impossible to isolate the effect of iterative computation under matched budgets. To this end, we present LoopMoE, a looped MoE language model that integrates sparse routing with...

arXiv CS 6d ago

Pruning and Distilling Mixture-of-Experts into Dense Language Models

arXiv:2605.28207v2 Announce Type: replace Abstract: Mixture-of-Experts (MoE) is now the dominant architecture for frontier language models, yet it requires all expert parameters to be loaded in memory, making it less preferable for memory-constrained deployment. Existing compression methods reduce the number of experts but the output remains an MoE model with the same fundamental limitation. We present the first systematic framework for converting a trained MoE into a standard fully dense...

arXiv CS 1d ago

HyperParallel-MoE: Multi-Core Interleaved Scheduling for Fast MoE Training on Ascend NPUs

arXiv:2605.23764v2 Announce Type: replace Abstract: Modern Mixture-of-Experts (MoE) models increasingly rely on large-scale AI accelerator clusters for efficient training. Ascend NPUs expose heterogeneous on-chip compute resources, including matrix-oriented AIC units and vector-oriented AIV units with explicit cross-queue synchronization support. However, existing training frameworks largely execute MoE operators in a serialized kernel-by-kernel manner, leaving substantial heterogeneous...

arXiv CS 8d ago

From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models

arXiv:2603.04828v2 Announce Type: replace Abstract: Pre-training data detection for LLMs is essential for addressing copyright concerns and mitigating benchmark contamination. Existing methods mainly focus on the likelihood-based statistical features or heuristic signals before and after fine-tuning, but the former are susceptible to word frequency bias in corpora, and the latter strongly depend on the similarity of fine-tuning data. From an optimization perspective, we observe that during...

arXiv CS 8d ago