MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging

arXiv CS Monday 01 June 2026, 04:00 UTC By Luyuan Zhang, Siyuan Li, Zedong Wang, Qingsong Xie, Cheng Tan, Anna Wang, Yanhao Zhang, Chen Chen, Haonan Lu, Haoqian Wang 1 min read

Key Points

arXiv:2605.30904v1 Announce Type: new Abstract: Most visual tokenizers for image generation are bifurcated into two families with complementary limitations: continuous VAEs offer high-fidelity reconstruction but suffer from dense, entangled latents that are poorly suited for semantic control, whereas discrete VQ-based models enable autoregressive generation yet struggle with gradient sparsity, unstable training, and codebook collapse. In this work, we introduce MergeTok, a unified tokenizer that jointly optimizes continuous (VAE) and discrete (VQ) tokenizers within a encoder-decoder architecture, leveraging token merging techniques as a semantic bridge. By clustering similar tokens during encoding, MergeTok establishes a structural prior that provides dual supervision signals: (i) it imposes merged-token semantic alignment in the VAE branch, regularizing its latent space toward disentangled, semantic-aware representations; (ii) it derives group-wise constraints, promoting intra-group diversity and inter-group exclusivity that stabilize VQ training. MergeTok shows competitive reconstruction and generation performance on ImageNet-256, with substantially lower rFID than strong VAE and VQ models under matched token budgets, while producing semantically-organized token representations compatible with both autoregressive and diffusion generators. This shows that a single architecture can endow visual tokenizers with robust semantic organization and generator-friendly discreteness.

MergeTok (ORG) Unified Continuous and Discrete Visual Tokenization (ORG) Token Merging arXiv:2605.30904v1 Announce Type: (ORG) VAE (ORG)

Originally published by arXiv CS Read original →

Electric vehicle sales might be better now than the end of last year when demand fell off a cliff following the surge of purchases ahead of the end of the federal financial incentives, but it's clear they haven't panned out as well as many in the automotive industry had hoped. Still, at a GM event Ars attended in San Francisco this week, the company continues to stick to its guns with an EV lineup spanning its brands. The automaker shared that it has also been working toward the adoption of...

Ars Technica 2h ago

Worker bees build a 'royal palace' for the honeybee queen

Worker bees build a 'royal palace' for the honeybee queen June 10 : Honeybee queens come from the same ordinary fertilized female eggs as worker bees. So how does one bee become a queen - with the responsibility of serving as the colony's only baby maker - rather than just another worker? Until now, scientists believed it was solely because the chosen bee was served a special diet.

Channel News Asia 2h ago

Starlink rival Qianfan hits satellite milestone, but is it too slow and costly?

Starlink rival Qianfan hits satellite milestone, but is it too slow and costly? Constellation now has 201 satellites in orbit but the company is said to be under pressure to ramp up launches The constellation now has 201 satellites after a successful launch on board a Zhuque-2E rocket from the Gobi Desert at 4.23pm Beijing time on Tuesday. The mission delivered Qianfan DTC-01 – a direct-to-cell test satellite – alongside a satellite from China Mobile, state broadcaster CCTV reported.

South China Morning Post 2h ago

Insta360's Luna Ultra takes on DJI's Osmo Pocket gimbal cameras

Insta360's Luna Ultra takes on DJI's Osmo Pocket gimbal cameras The camera, which has a detachable screen, will be available starting today for $770. Insta360 has launched Luna Ultra, a direct competitor to DJI's Osmo Pocket gimbal camera lineup primarily meant for vlogging and travel documentation.

Engadget 2h ago

MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging

Related Stories

GM Energy introduces V2G support and new energy storage battery chemistry

Worker bees build a 'royal palace' for the honeybee queen

Starlink rival Qianfan hits satellite milestone, but is it too slow and costly?

Insta360's Luna Ultra takes on DJI's Osmo Pocket gimbal cameras