Home Knowledge Base Smooth Activations

Smooth Activations

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Mitigating the Curse of Dimensionality in Uniform Convergence of Deep Neural Networks via Smooth Activations

arXiv:2606.05599v1 Announce Type: new Abstract: This paper establishes a theoretical framework for the uniform convergence of smoothly activated deep neural network (DNN) estimators. While standard ReLU networks achieve minimax-optimal rates in the $L^2(P)$ norm for various nonparametric regression tasks, we establish a theoretical lower bound demonstrating that least-squares ReLU estimators can suffer from the curse of dimensionality in their uniform convergence behavior. Motivated by the...

arXiv CS 5d ago

Optimal Rates for Generalization of Gradient Descent for Deep ReLU Classification

arXiv:2510.02779v4 Announce Type: replace Abstract: Recent advances have significantly improved our understanding of the generalization performance of gradient descent (GD) methods in deep neural networks. A natural and fundamental question is whether GD can achieve generalization rates comparable to the minimax optimal rates established in the kernel setting. Existing results either yield suboptimal rates of $O(1/\sqrt{n})$, or focus on networks with smooth activation functions, incurring...

arXiv CS 7d ago

A Geometric Characterization of the Stationary Plateau for Two-Layer Neural Networks

arXiv:2606.04327v1 Announce Type: new Abstract: We investigate the geometric structure of stationary plateaus that arise in the loss landscape of two-layer neural networks with smooth activation functions. We focus on the phenomenon of "neuron splitting" where duplicating a hidden neuron yields an affine set of stationary points in a wider network. We provide a comprehensive classification of all stationary points on these plateaus, determining under what conditions they constitute local...

arXiv CS 6d ago

Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos

arXiv:2605.21648v2 Announce Type: replace Abstract: We develop a mean-field theory of dropout as a perturbation of critical signal propagation at the edge of chaos, and show that it predicts a simple, no-cost change to standard practice: \emph{front-loaded} dropout schedules cut test loss by \(18\)--\(35\%\) over constant dropout in MLPs and Vision Transformers at fixed budget. The theoretical mechanism is that dropout shifts the perfect-alignment fixed point, making the depth scale for...

arXiv CS 8d ago

Crazy Taxi World Tour will offer more freedom, bite-sized missions and fishing with a car

Crazy Taxi World Tour will offer more freedom, bite-sized missions and fishing with a car A crazy ride in the era of ride shares. It's part of Sega's push to bring its greatest hits and franchises to a new generation of gamers — refreshed, remade and presented in widescreen. It's also aimed at the game's superfans, with a soundtrack pulled straight out of 1999.

Engadget 2d ago

Rethinking Neural Network Learning Rates: A Stackelberg Perspective

arXiv:2605.15530v3 Announce Type: replace Abstract: Neural networks are typically trained with a single learning rate across all layers. While recent empirical evidence suggests that assigning layer-specific learning rates can accelerate training, a principled understanding of the conditions and mechanisms under which non-uniform learning rates are beneficial remains limited. In this work, we investigate non-uniform learning rates through the lens of Stackelberg optimization.

arXiv CS 9d ago

Generalization in Deep Neural Networks: Minimax Rates for Gradient Methods

Announce Type: cross Abstract: Understanding the generalization performance of over-parameterized neural networks has become a central topic in deep learning theory. While recent advances, particularly works under the Neural Tangent Kernel (NTK) regime, have shed light on the behavior of shallow architectures, the statistical generalization properties of deep neural networks (DNNs), especially in regression tasks, remain far less understood.

arXiv CS 2d ago

Source Side Mitigation of AI Datacenter Power Fluctuations with a Hybrid Energy Storage System and Residual Differentiable Predictive Control

arXiv:2606.04869v1 Announce Type: new Abstract: The rapid growth of hyperscale AI datacenters introduces structured, workload-driven active-power fluctuations at the point of interconnection. These fluctuations appear to the grid as time-varying disturbance injections that cannot be captured by conventional peak- or average-load representations. To reduce the residual power disturbance before it propagates into the bulk power system, this paper proposes a hybrid energy storage system with...

arXiv CS 6d ago

Bayesian Inference with Shaped Deep Non-linear MLPs

arXiv:2605.30860v1 Announce Type: cross Abstract: A central aim of deep learning theory is to characterize how neural networks make predictions in the regime of simultaneously large model and training set size. Since the limits of diverging number of model parameters and dataset size do not commute it is not clear a priori what limits exist. In this work, we shed new light on these questions by studying Bayesian inference in deep non-linear MLPs in the regime where the number of training...

arXiv CS 9d ago

Distance Mapping and Variable-Specific Geometry of Goal-Relevant Frames in the Retrosplenial Cortex

Goal-directed navigation requires animals to continuously update their position relative to an unmarked goal. Here, we recorded retrosplenial cortex (RSC) activity in freely moving rats during goal-directed navigation and random foraging. We found that RSC neurons encoded the Euclidean distance to the goal, and that this distance representation was selectively biased toward the goal during navigation.

bioRxiv 7d ago