Bounded Hyperbolic Tanh
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models
arXiv:2601.09719v3 Announce Type: replace Abstract: Pre-Layer Normalization (Pre-LN) is the de facto choice for large language models (LLMs) and is crucial for stable pretraining and effective transfer learning. However, Pre-LN incurs repeated statistical-computation overhead and remains vulnerable to the curse of depth, where hidden-state magnitudes and variances grow as the number of layers increases, destabilizing training. Efficiency-oriented normalization-free methods such as Dynamic...
One-Shot Klein Cutting Planes for Lipschitz Geodesically Convex Optimization in Hyperbolic Space
arXiv:2605.17540v4 Announce Type: replace Abstract: Motivated by the COLT 2023 open problem of Criscitiello, Mart\'inez-Rubio, and Boumal on deterministic first-order methods for Lipschitz geodesically convex optimization on Hadamard manifolds, we study hyperbolic space \[ \HH^d_{-\kappaC^2} =\{X\in\R^{d+1}:\ipL{X}{X}=-1,\ X_0>0\}, \qquad \ip{U}{V}_X=\kappaC^{-2}\ipL{U}{V}. For every geodesically convex $M$-Lipschitz function \[ f:\bar B_{\HH}(x_0,r)\to\R,\qquad s=\kappaC r, \] we give a...