Home Knowledge Base MTP

MTP

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)

A 10 year old Xeon is all you need 17 minutes read The previous post covered getting Gemma 4’s MTP drafters quantized and paired with a verifier. This one is about running the result on a machine that has no business running it. I have a recycled server.

Hacker News 9d ago

FlashTTS: Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation

arXiv:2606.09141v1 Announce Type: cross Abstract: Recent progress in speech dialogue systems requires Text-to-Speech (TTS) models to be faster and more responsive. Modern speech dialogue systems impose two primary requirements on TTS models: low latency and support for streaming inputs and outputs. However, most existing single-codebook LLM-based TTS methods rely on multi-stage pipelines that lack native streaming capabilities.

arXiv CS 1d ago

Fast and Expressive Multi-Byte Prediction with Probabilistic Circuits

arXiv:2511.11346v2 Announce Type: replace Abstract: Multi-token prediction (MTP) is a prominent strategy to significantly speed up generation in large language models (LLMs), especially in byte-level LLMs, which are tokeniser-free but prohibitively slow. However, many existing MTP methods either assume independence between future tokens, sacrificing expressiveness, or generate tokens one at a time within the window, increasing latency. In this work, we investigate the trade-off between...

arXiv CS 7d ago

Pair-In, Pair-Out: Latent Multi-Token Prediction for Efficient LLMs

arXiv:2605.27255v2 Announce Type: replace Abstract: Long chain-of-thought reasoning has made autoregressive decoding the dominant inference cost of modern large language models. Existing methods target either the input side (latent compression) or the output side (speculative decoding and multi-token prediction, MTP), but the two lines of work have been pursued independently. Moreover, output-side methods must incur an expensive verifier pass to validate the unreliable draft tokens predicted...

arXiv CS 9d ago

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency Since releasing Gemma 4 two months ago, we've been continuously working to expand its capabilities. First, we introduced Multi-Token Prediction (MTP) to accelerate inference, and just a couple of days ago, we released a 12B model to bridge the gap between our E4B and 26B MOE models. Today, we are releasing new checkpoints optimized with Quantization-Aware Training (QAT) to make Gemma 4 even more efficient, so...

Hacker News 5d ago

World's most travelled people meet in Portugal

Most Traveled People (MTP) gathered in Portugal around 200 people who take travel extremely seriously; half have already visited every UN‑recognised country. Jack Wheeler proudly shows the passport issued in the 1960s, among all the passports he has had throughout his life. He has brought the documents to prove that he has already visited every country in the world.

Euronews 2d ago

I Put a Datacenter GPU in My Gaming PC for £200

I Put a Datacenter GPU in My Gaming PC for £200 I already had an RTX 4080. Good enough for gaming, not good enough for the models I wanted to run locally. The next step up in GPU land is either spend a fortune on a card with more VRAM, or find another way.

Hacker News 10d ago

Targeted Linguistic Analysis of Sign Language Models with Minimal Translation Pairs

arXiv:2604.27232v2 Announce Type: replace Abstract: Models of sign language have historically lagged behind those for spoken language (text and speech). Recent work has greatly improved their performance on tasks like sign language translation and isolated sign recognition. However, it remains unclear to what extent existing models capture various linguistic phenomena of sign language, and how well they use cues from the multiple articulators used in sign language (hands, upper body, face).

arXiv CS 7d ago

F3-Tokenizer: Taming Audio Autoencoder Latents for Understanding and Generation

arXiv:2606.06357v1 Announce Type: new Abstract: Continuous audio autoencoders reconstruct waveforms well but often produce latents with weak structure for understanding, while self-supervised audio encoders capture semantics but are not directly decodable. This mismatch complicates a single audio tokenizer that must support both understanding and generation.

arXiv CS 5d ago

Gemma 4 12B: A unified, encoder-free multimodal model

Introducing Gemma 4 12B: a unified, encoder-free multimodal model Today, we are introducing Gemma 4 12B, our latest model designed to bring agentic multimodal intelligence directly to laptops. Bridging the gap between our edge-friendly E4B and our more advanced 26B Mixture of Experts (MoE), Gemma 4 12B packages powerful capabilities inside a reduced memory footprint. It is also our first mid-sized model to feature native audio inputs.

Hacker News 7d ago