Deep Linear Networks
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway
new Abstract: Recent analyses of multi-pathway Deep Linear Networks use Gradient Flow to predict a "winner-takes-all" specialization in which path symmetry breaks and each feature concentrates in a single pathway. In this work, we show that discrete Gradient Descent (GD) with a large step size tells a different story. We prove that single-path solutions are sharp minima, whereas distributing signals across pathways reduces sharpness by a factor that decreases with both the number of pathways...
Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction
arXiv:2606.05863v1 Announce Type: new Abstract: Grokking suggests that fitting the training data and learning a simple underlying rule may occur on different time scales. We formalize this phenomenon by separating the fast decay of the classification loss from the slower simplification of the learned representation, and we call the resulting pair of stopping times two training clocks. For deep linear networks, we show that a post-margin gap-growth or one-step tail-contraction condition...
R2DN: Scalable Parameterization of Contracting and Lipschitz Recurrent Deep Networks
arXiv:2504.01250v2 Announce Type: replace Abstract: This paper presents the Robust Recurrent Deep Network (R2DN), a scalable parameterization of robust recurrent neural networks for machine learning and data-driven control. We construct R2DNs as the feedback interconnection of a linear time-invariant system and a 1-Lipschitz deep feedforward network, and directly parameterize the weights so that our models are stable (contracting) and robust to small input perturbations (Lipschitz) by...
SaluNet: Enabling Total Plasticity in Normalization-Free Deep Networks
Announce Type: new Abstract: Normalization layers such as BatchNorm and LayerNorm have long been considered essential for stable training in deep networks. This work demonstrates that they can be fully replaced by a single learnable activation mechanism. We identify a plasticity suppression effect induced by standard normalization: learnable activation parameters rapidly lose adaptability when paired with normalization layers.
Bayesian Inference with Shaped Deep Non-linear MLPs
arXiv:2605.30860v1 Announce Type: cross Abstract: A central aim of deep learning theory is to characterize how neural networks make predictions in the regime of simultaneously large model and training set size. Since the limits of diverging number of model parameters and dataset size do not commute it is not clear a priori what limits exist. In this work, we shed new light on these questions by studying Bayesian inference in deep non-linear MLPs in the regime where the number of training...
Can Local Learning Match Self-Supervised Backpropagation?
arXiv:2601.21683v2 Announce Type: replace Abstract: While end-to-end self-supervised learning with backpropagation (global BP-SSL) has become central for training modern AI systems, theories of local self-supervised learning (local-SSL) have struggled to build functional representations in deep neural networks. To establish a link between global and local rules, we first develop a theory for deep linear networks: we identify conditions for local-SSL algorithms (like Forward-forward or CLAPP)...
Neural Network Verification using Partial Multi-Neuron Relaxation
arXiv:2605.30155v3 Announce Type: replace Abstract: The increasing integration of deep neural networks in critical systems has spawned a theoretical and practical interest in formally guaranteeing safety properties about their behavior. To achieve this, contemporary verification algorithms rely on computing linear relaxations for a network's non-linear activation functions. Existing approaches for linear relaxations typically fall into one of two categories: single-neuron relaxation, in...
Neural Network Verification using Partial Multi-Neuron Relaxation
arXiv:2605.30155v2 Announce Type: replace Abstract: The increasing integration of deep neural networks in critical systems has spawned a theoretical and practical interest in formally guaranteeing safety properties about their behavior. To achieve this, contemporary verification algorithms rely on computing linear relaxations for a network's non-linear activation functions. Existing approaches for linear relaxations typically fall into one of two categories: single-neuron relaxation, in...
Augmented Lagrangian Predictive Coding
arXiv:2605.31022v1 Announce Type: new Abstract: Predictive coding (PC) is a local-learning alternative to backpropagation (BP), training deep networks via local energy-minimization dynamics rather than a global backward pass. We introduce Augmented Lagrangian Predictive Coding (PC-ALM), which maintains PC's inference budget but aligns each weight update toward BP by accumulating per-layer constraint errors into a layer-local Lagrange multiplier. In linear PC networks, PC-ALM converges to an...
Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning
Announce Type: replace Abstract: We investigate why deep neural networks suffer from loss of plasticity in continual learning, and thus fail to learn new tasks without reinitializing parameters. We show that this failure is preceded by Hessian spectral collapse at new-task initialization, where meaningful curvature directions vanish and gradient descent becomes ineffective. Analyzing a linearized ReLU network, we derive explicit $\epsilon$-rank conditions for successful training and prove...