Home › Business & Finance › ChannelTok: Efficient Flexible-Length Vision Tokenization

Business & Finance

ChannelTok: Efficient Flexible-Length Vision Tokenization

arXiv CS Thursday 04 June 2026, 04:00 UTC By Sukriti Paul, Arpit Bansal, Tom Goldstein 1 min read

Key Points

Announce Type: new Abstract: Leading flexible vision tokenizers achieve SOTA quality at an extreme cost, relying on parameter-heavy backbones and slow, multi-step generative decoders. We depart from this complex, spatial-token paradigm and introduce a simple, lightweight, and fast channel-wise flexible-length tokenizer. Our method treats each latent channel as a visual token, enabling a parameter-efficient CNN-Transformer hybrid backbone.

arXiv:2606.04461v1 Announce Type: new Abstract: Leading flexible vision tokenizers achieve SOTA quality at an extreme cost, relying on parameter-heavy backbones and slow, multi-step generative decoders. We depart from this complex, spatial-token paradigm and introduce a simple, lightweight, and fast channel-wise flexible-length tokenizer. Our method treats each latent channel as a visual token, enabling a parameter-efficient CNN-Transformer hybrid backbone. Furthermore, employing a stochastic tail-dropping paradigm during training naturally forces channels to organize by semantic importance. This allows for flexible compression at inference by simply retaining the first $k$ channels, and naturally enables variable-length autoregressive image generation. We validate our approach through extensive experiments on ImageNet, demonstrating consistent quality across diverse token budgets. The results establish a new quality-efficiency frontier: our model achieves state-of-the-art perceptual quality (rFID 2.92) while being $8.6\times$ faster in decoding and $2.1\times$ smaller (159M params) than the next-best alternative. Our work establishes channel-wise tokenization as a powerful and practical paradigm for efficient visual representation. Project page: https://channeltok.github.io

ChannelTok (ORG) Vision Tokenization (ORG) SOTA (ORG) CNN (ORG) ImageNet (ORG)

Originally published by arXiv CS Read original →

ChannelTok: Efficient Flexible-Length Vision Tokenization

Related Stories

Warburg CEO Calls IPO Market ‘Broken’ Even Amid Giant Offerings

'Partners and friends’: Trade and defence top of agenda at EU-South Korea summit

Trump signs $70 billion immigration funding bill after months of delay

Pay what you wish: the restaurant where customers can eat for free – if their conscience lets them