Home › Knowledge Base › Semantic IDs

Semantic IDs

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Decoupled Residual Quantization for Robust Semantic IDs in Recommendation

Announce Type: new Abstract: Semantic IDs represent items as shared discrete token sequences and have become a practical tool for recommendation and retrieval. Yet it remains difficult to tell why a tokenizer fails: poor quality may come from codebook underutilization, unstable decision boundaries, or geometric distortion of the embedding space. This paper develops a quantitative framework for diagnosing these failures through expected codeword overlap and effective codebook capacity.

arXiv CS 8d ago

Deep Interest Mining for Intent-Enriched Semantic IDs in Multimodal Generative Recommendation

Announce Type: replace Abstract: Semantic IDs (SIDs) provide the discrete item vocabulary used by generative recommendation, but their quality depends on what item evidence is preserved before quantization. In product recommendation, surface metadata often misses latent usage intent, visual evidence may be only weakly reflected in text, and downstream policy learning provides sparse feedback about whether a generated SID corresponds to a semantically useful item. We introduce...

arXiv CS 8d ago

Understanding Generative Recommendation with Semantic IDs from a Model-scaling View

arXiv:2509.25522v3 Announce Type: replace Abstract: Recent advancements in generative models have allowed the emergence of a promising paradigm for recommender systems (RS), known as Generative Recommendation (GR), which tries to unify rich item semantics and collaborative filtering signals. One popular modern approach is to use semantic IDs (SIDs), which are discrete codes quantized from the embeddings of modality encoders (e.g., large language or vision models), to represent items in an...

arXiv CS 2d ago

Gryphon: A Unified Architecture for Semantic-ID Generation and Item-Level Scoring in Industrial Recommendations

arXiv:2606.08604v1 Announce Type: new Abstract: Generative retrieval (GR) has become a scalable approach to candidate generation: each item is assigned a short hierarchical token sequence called a Semantic ID (SID), and the next item's SID is decoded autoregressively. A practical limitation is that the decoder's beam search optimizes the likelihood of token sequences, not the relevance of the underlying items. These objectives diverge when sequence likelihood is poorly calibrated due to beam...

arXiv CS 1d ago

Quantizing Intent: Cross-Domain Semantic IDs from Organic Activity for Industrial Ranking

arXiv:2606.01396v1 Announce Type: new Abstract: Ads click-through rate (CTR) prediction is constrained by sparse user supervision: most users engage with ads infrequently while generating dense behavioral evidence in organic surfaces such as feed. Transferring these cross-domain signals into ads ranking is difficult due to domain mismatch, serving cost, and production complexity. We introduce cross-domain user Semantic IDs (SIDs) derived from organic feed activity and show that behavioral...

arXiv CS 8d ago

Taiji: Pareto Optimal Policy Optimization with Semantics-IDs Trade-off for Industrial LLM-Enhanced Recommendation

arXiv:2606.03866v1 Announce Type: new Abstract: Scaling recommender systems via large language models (LLMs) has become a prominent trend in the industry. However, aligning the LLM's semantic space with the recommender's ID space via post-training (e.g., SFT and RL) remains challenging. Existing LLM4Rec paradigms are bottlenecked by two main issues: (1) the difficulty of measuring and improving chain-of-thought (CoT) quality in open-domain recommendation during SFT, and (2) the neglect of...

arXiv CS 7d ago

SSRLive: Live Streaming Recommendation with Dynamic Semantic ID

Announce Type: new Abstract: Live streaming has emerged as one of the fastest-growing forms of online media, enabling instant content broadcasting and real-time engagement between users and streamers. Despite the effectiveness of existing recommendation algorithms in this domain, they often suffer from limited utilization of computational resources, with low FLOPs that hinder further performance enhancement. Generative recommendation techniques, which have gained traction in various...

arXiv CS 2d ago

TriAlignGR: Triangular Multitask Alignment with Multimodal Deep Interest Mining for Generative Recommendation

arXiv:2605.05249v3 Announce Type: replace Abstract: We introduce TriAlignGR, a unified multitask-multimodal framework for generative recommendation that establishes two-stage multimodal semantic propagation: (i) encoding visual semantics directly into SIDs via multimodal embeddings, and (ii) enabling the model to decode these semantics through visual description tasks. Existing Semantic ID (SID) pipelines suffer from two fundamental but underexplored problems: \textbf{SID Content Degradation...

arXiv CS 7d ago

DREAM: Dynamic Refinement of Early Assignment Mappings

arXiv:2606.06947v1 Announce Type: new Abstract: Generative recommendation advances item retrieval by reformulating it as autoregressive generation of Semantic IDs (SIDs), compact token sequences that encode item semantics. While SIDs offer a strong semantic prior, current SID-based methods assign each item a single static identifier through offline tokenization before sufficient user feedback is observed. For cold-start items, this one-shot commitment produces poorly discriminative codes,...

arXiv CS 2d ago

Why Thinking Hurts: Diagnosing and Rectifying Linguistic Inertia in Large Language Models for Recommendation

Announce Type: replace Abstract: Chain-of-Thought (CoT) reasoning is widely used to improve LLM performance, and recent foundation recommender models adopt it by generating textual reasoning before predicting target items represented by Semantic IDs (SIDs). However, we observe that enabling thinking mode in models such as OpenOneRec can degrade recommendation quality by up to 25%. We investigate this failure and identify Linguistic Inertia: when a textual CoT segment is inserted before SID...

arXiv CS 8d ago