Deep Interest Mining for Intent-Enriched Semantic IDs in Multimodal Generative Recommendation

arXiv CS Tuesday 02 June 2026, 04:00 UTC By Yangchen Zeng, Jinze Wang 1 min read

Key Points

Announce Type: replace Abstract: Semantic IDs (SIDs) provide the discrete item vocabulary used by generative recommendation, but their quality depends on what item evidence is preserved before quantization. In product recommendation, surface metadata often misses latent usage intent, visual evidence may be only weakly reflected in text, and downstream policy learning provides sparse feedback about whether a generated SID corresponds to a semantically useful item. We introduce...

arXiv:2604.20861v3 Announce Type: replace Abstract: Semantic IDs (SIDs) provide the discrete item vocabulary used by generative recommendation, but their quality depends on what item evidence is preserved before quantization. In product recommendation, surface metadata often misses latent usage intent, visual evidence may be only weakly reflected in text, and downstream policy learning provides sparse feedback about whether a generated SID corresponds to a semantically useful item. We introduce \textbf{DeepInterestGR}, an intent-enriched SID framework for generative recommendation. Before SID quantization, \textbf{CMSA} enriches item representations through two complementary evidence paths: recommendation-oriented VLM captions and projected image embeddings. \textbf{DCIM} then uses an LLM to mine item-side intent descriptors -- latent usage motivations implied by product content rather than personalized user states. During policy training over the constructed SIDs, \textbf{QARM} adds a relevance-gated semantic-quality bonus on top of standard SID rewards, applying the bonus only when the generated SID decodes to the target item. Thus, semantic quality cannot reward a fluent but irrelevant item prediction. Experiments on three Amazon Product Review categories (Beauty, Sports, and Instruments) show that DeepInterestGR improves over competitive generative and RL-based baselines, with relative gains of up to \textbf{15.1\%} in NDCG@5 and \textbf{13.9\%} in NDCG@10 over the strongest per-metric baseline. Component ablations, CMSA branch analyses, reward variants, and SID-level case studies support a bounded claim: enriching pre-quantization item evidence with visual cues and item-side intent descriptors, together with relevance-gated semantic rewards, improves SID-based generative recommendation under the evaluated settings.

Deep Interest Mining (ORG) Multimodal Generative Recommendation arXiv:2604.20861v3 (ORG) SID (ORG) \textbf{DeepInterestGR (ORG) LLM (ORG) Amazon Product Review (ORG) Instruments (ORG) DeepInterestGR (ORG) RL (ORG) NDCG@10 (LOCATION) CMSA (ORG)

Originally published by arXiv CS Read original →

Deep Interest Mining for Intent-Enriched Semantic IDs in Multimodal Generative Recommendation

Related Stories

WhatsApp ordered to host rival AI assistants for free

France and Germany agree to disagree, ditch joint next-gen Euro fighter

AWS Bedrock to require sharing data with Anthropic for Mythos and future models

Global watchdog calls for tighter controls on agentic AI in finance