Home Knowledge Base TQP

TQP

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Reversible Foundations: Training a 120B Sparse MoE through State-Preserving Scaling

Announce Type: new Abstract: This paper reports on training a hundred-billion-parameter sparse mixture of experts on a single eight-GPU node, end to end. LightningLM 0.1V is a recurrence-backbone language model family grown in four stages from a small dense seed, through a 5B and a 9B mixture of experts, to a 120B model with 460 routed experts under top-12 routing.

arXiv CS 2d ago