SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling

arXiv CS Friday 05 June 2026, 04:00 UTC By Zikun Liu, Liang Luo, Qianru Li, Zhengyu Zhang, Wei Ling, Jingyi Shen, Zeliang Chen, Yaning Huang, Jingxian Huang, Abdallah Aboelela, Chonglin Sun, Feifan Gu, Fenggang Wu, Hang Qu, Huayu Li, Jill Pan, Kaidi Pei, Laming Chen, Longhao Jin, Qin Huang, Tongyi Tang, Varna Puvvada, Wenlin Chen, Xiaohan Wei, Xu Cao, Yantao Yao, Yuan Jin, Yunchen Pu, Yuxin Chen, Zijian Shen, Zhengkai Zhang, Jing Zhu, Dong Liang, Ellie Wen 1 min read

Key Points

arXiv:2604.12110v2 Announce Type: replace Abstract: Recent advances in recommendation scaling laws have led to foundation models of unprecedented complexity. While these models offer superior performance, their computational demands make real-time serving impractical, often forcing practitioners to rely on knowledge distillation-compromising serving quality for efficiency. To address this challenge, we present SOLARIS (Speculative Offloading of Latent-bAsed Representation for Inference Scaling), a novel framework inspired by speculative decoding. SOLARIS proactively precomputes user-item interaction embeddings by predicting which user-item pairs are likely to appear in future requests, and asynchronously generating their foundation model representations ahead of time. This approach decouples the costly foundation model inference from the latency-critical serving path, enabling real-time knowledge transfer from models previously considered too expensive for online use. Deployed across Meta's advertising system serving billions of daily requests, SOLARIS achieves 0.67% revenue-driving top-line metrics gain, demonstrating its effectiveness at scale.

SOLARIS (ORG) Meta (ORG)

Originally published by arXiv CS Read original →

SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling

Related Stories

Pump-action soap dispensers are a disgrace – and I won’t put up with them any longer | Adrian Chiles

Toby Carvery to pay for orchard planting after causing outrage by felling 500-year-old oak

Deepest and most extensive whale graveyard discovered in Indian Ocean

Deepest and most extensive whale graveyard discovered in Indian Ocean