Smooth Activations
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Mitigating the Curse of Dimensionality in Uniform Convergence of Deep Neural Networks via Smooth Activations
arXiv:2606.05599v1 Announce Type: new Abstract: This paper establishes a theoretical framework for the uniform convergence of smoothly activated deep neural network (DNN) estimators. While standard ReLU networks achieve minimax-optimal rates in the $L^2(P)$ norm for various nonparametric regression tasks, we establish a theoretical lower bound demonstrating that least-squares ReLU estimators can suffer from the curse of dimensionality in their uniform convergence behavior. Motivated by the...
Optimal Rates for Generalization of Gradient Descent for Deep ReLU Classification
arXiv:2510.02779v4 Announce Type: replace Abstract: Recent advances have significantly improved our understanding of the generalization performance of gradient descent (GD) methods in deep neural networks. A natural and fundamental question is whether GD can achieve generalization rates comparable to the minimax optimal rates established in the kernel setting. Existing results either yield suboptimal rates of $O(1/\sqrt{n})$, or focus on networks with smooth activation functions, incurring...
A Geometric Characterization of the Stationary Plateau for Two-Layer Neural Networks
arXiv:2606.04327v1 Announce Type: new Abstract: We investigate the geometric structure of stationary plateaus that arise in the loss landscape of two-layer neural networks with smooth activation functions. We focus on the phenomenon of "neuron splitting" where duplicating a hidden neuron yields an affine set of stationary points in a wider network. We provide a comprehensive classification of all stationary points on these plateaus, determining under what conditions they constitute local...
Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos
arXiv:2605.21648v2 Announce Type: replace Abstract: We develop a mean-field theory of dropout as a perturbation of critical signal propagation at the edge of chaos, and show that it predicts a simple, no-cost change to standard practice: \emph{front-loaded} dropout schedules cut test loss by \(18\)--\(35\%\) over constant dropout in MLPs and Vision Transformers at fixed budget. The theoretical mechanism is that dropout shifts the perfect-alignment fixed point, making the depth scale for...
Crazy Taxi World Tour will offer more freedom, bite-sized missions and fishing with a car
Crazy Taxi World Tour will offer more freedom, bite-sized missions and fishing with a car A crazy ride in the era of ride shares. It's part of Sega's push to bring its greatest hits and franchises to a new generation of gamers — refreshed, remade and presented in widescreen. It's also aimed at the game's superfans, with a soundtrack pulled straight out of 1999.
Rethinking Neural Network Learning Rates: A Stackelberg Perspective
arXiv:2605.15530v3 Announce Type: replace Abstract: Neural networks are typically trained with a single learning rate across all layers. While recent empirical evidence suggests that assigning layer-specific learning rates can accelerate training, a principled understanding of the conditions and mechanisms under which non-uniform learning rates are beneficial remains limited. In this work, we investigate non-uniform learning rates through the lens of Stackelberg optimization.
Generalization in Deep Neural Networks: Minimax Rates for Gradient Methods
Announce Type: cross Abstract: Understanding the generalization performance of over-parameterized neural networks has become a central topic in deep learning theory. While recent advances, particularly works under the Neural Tangent Kernel (NTK) regime, have shed light on the behavior of shallow architectures, the statistical generalization properties of deep neural networks (DNNs), especially in regression tasks, remain far less understood.
Source Side Mitigation of AI Datacenter Power Fluctuations with a Hybrid Energy Storage System and Residual Differentiable Predictive Control
arXiv:2606.04869v1 Announce Type: new Abstract: The rapid growth of hyperscale AI datacenters introduces structured, workload-driven active-power fluctuations at the point of interconnection. These fluctuations appear to the grid as time-varying disturbance injections that cannot be captured by conventional peak- or average-load representations. To reduce the residual power disturbance before it propagates into the bulk power system, this paper proposes a hybrid energy storage system with...
Bayesian Inference with Shaped Deep Non-linear MLPs
arXiv:2605.30860v1 Announce Type: cross Abstract: A central aim of deep learning theory is to characterize how neural networks make predictions in the regime of simultaneously large model and training set size. Since the limits of diverging number of model parameters and dataset size do not commute it is not clear a priori what limits exist. In this work, we shed new light on these questions by studying Bayesian inference in deep non-linear MLPs in the regime where the number of training...
Distance Mapping and Variable-Specific Geometry of Goal-Relevant Frames in the Retrosplenial Cortex
Goal-directed navigation requires animals to continuously update their position relative to an unmarked goal. Here, we recorded retrosplenial cortex (RSC) activity in freely moving rats during goal-directed navigation and random foraging. We found that RSC neurons encoded the Euclidean distance to the goal, and that this distance representation was selectively biased toward the goal during navigation.