MTP Acceleration
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
FlashTTS: Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
arXiv:2606.09141v1 Announce Type: cross Abstract: Recent progress in speech dialogue systems requires Text-to-Speech (TTS) models to be faster and more responsive. Modern speech dialogue systems impose two primary requirements on TTS models: low latency and support for streaming inputs and outputs. However, most existing single-codebook LLM-based TTS methods rely on multi-stage pipelines that lack native streaming capabilities.
Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency
Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency Since releasing Gemma 4 two months ago, we've been continuously working to expand its capabilities. First, we introduced Multi-Token Prediction (MTP) to accelerate inference, and just a couple of days ago, we released a 12B model to bridge the gap between our E4B and 26B MOE models. Today, we are releasing new checkpoints optimized with Quantization-Aware Training (QAT) to make Gemma 4 even more efficient, so...