the First Scaling Laws
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation
arXiv:2602.07298v3 Announce Type: replace Abstract: Large Language Models (LLMs) represent a promising frontier for recommender systems, yet their development has been impeded by the absence of predictable scaling laws, which are crucial for guiding research and optimizing resource allocation. We hypothesize that this may be attributed to the inherent noise, bias, and incompleteness of raw user interaction data in prior continual pre-training (CPT) efforts. This paper introduces a novel,...
Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics
Announce Type: replace Abstract: Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first systematic study of scaling behaviour for masked-reconstruction transformers trained on single-cell RNA sequencing (scRNA-seq) data. Using expression profiles from the CELLxGENE Census, we construct two...
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
Announce Type: replace Abstract: Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample...
Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training
arXiv:2606.05610v1 Announce Type: new Abstract: The efficacy of continued pre-training for Large Language Models (LLMs) hinges upon hyperparameter configurations, such as learning rate and batch size. However, current practices often rely on heuristics or grid searches, leading to training instability and excessive costs. In this work, we first empirically discover that optimal hyperparameters follow stable and predictable scaling laws throughout the continued pre-training process.
Spectral Scaling Laws of Muon
arXiv:2606.04058v2 Announce Type: replace Abstract: Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the orthonormalization with the Newton--Schulz (NS) iteration. Since NS is only approximate, directions with small singular values fail to be orthonormalized.
Spectral Scaling Laws of Muon
arXiv:2606.04058v1 Announce Type: new Abstract: Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the orthonormalization with the Newton--Schulz (NS) iteration.
Human-Like Neural Nets by Catapulting
Human-like Neural Nets by Catapulting Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence. There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are...
Dario Amodei speaks on leaving Sam Altman's OpenAI to start Anthropic
Dario Amodei, CEO and co-founder of Anthropic, has revealed the two core convictions that drove him to leave OpenAI and build what is now one of its most formidable rivals—scaling laws and safety. In a candid conversation on investor Nikhil Kamath's podcast WTF Is, Amodei traced his departure back to 2019, when early experiments with GPT-2 began showing him something most of his colleagues weren't ready to accept. "You find incredible increases in performance," he said, describing what...
Measuring the Symmetry--Data Exchange Rate
arXiv:2606.01090v1 Announce Type: cross Abstract: Equivariance theory predicts that an architectural symmetry prior reduces sample complexity by a factor of |G|; this is widely cited but rarely measured as a scaling law with controls that separate the prior from its confounds. On a controlled C_n-symmetric task, we report three findings. First, a wrong-group control with identical orbit size and matched compute is worse than no constraint (joint pairwise CI [+0.79, +3.26] excludes zero,...
CS336: Language Modeling from Scratch
Course Staff Logistics - Lectures: Monday/Wednesday 3:00-4:20pm in Skilling Auditorium - Recordings: YouTube playlist - Office hours: - Percy Liang: Fridays 11am-12pm in Gates 366 - Tatsu Hashimoto: Tuesdays 11-12am in Gates 364 - Marcel Rød: Tuesdays 4:30-5:30pm in Gates 498, Wednesdays 4:30-5:30pm in Gates 415 - Herman Brunborg: Wednesdays 1:30-2:30pm, Fridays 1:30-2:30pm, location Gates 392 - Steven Cao: Mondays 4:30-5:30pm, Thursdays 9:30-10:30am, Gates 200 - Contact: Students should ask...