the Fisher Hessian
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
On the Superlinear Relationship between SGD Noise Covariance and Loss Landscape Curvature
Announce Type: replace Abstract: Stochastic Gradient Descent (SGD) introduces anisotropic noise that is correlated with the local curvature of the loss landscape, thereby biasing optimization toward flat minima. Prior work often assumes an equivalence between the Fisher Information Matrix and the Hessian for negative log-likelihood losses, leading to the claim that the SGD noise covariance $\mathbf{C}$ is proportional to the Hessian $\mathbf{H}$. We show that this assumption holds only under...
The macroscopic Kaehler metric of Geometric Thermodynamics versus the microscopic one on the Event Manifold: Exact Partition Functions on CV manifolds. Extended Souriau temperatures and spontaneous magnetizations
arXiv:2606.09438v1 Announce Type: cross Abstract: In this paper we clarify the relation between Geometric Thermodynamics and Information Geometry based on the Fisher matrix. On the macroscopic odd-dimensional contact manifold of thermodynamic variables, we introduce for the first time a metric, whose pull-back on the isoentropic symplectic submanifolds transverse to the Reeb field is K\"ahlerian. The pull-back of such metric on equilibrium states, that are lagrangian submanifolds, is the...
Relaxation Kernel, Spectral Dissipation, and Global Convergence of Blahut--Arimoto Dynamics
arXiv:2604.25106v3 Announce Type: replace Abstract: We develop a spectral theory for continuous- and discrete-time Blahut--Arimoto (BA) dynamics, centered on the relaxation kernel $ \G = \E_p[K^*_X \otimes K^*_X] $. Five main results are established. Along the continuous-time BA flow, the free energy satisfies the exact $ \chi^2 $-dissipation identity $ \dot F_\beta = -\D(q) $, where $ \D(q)=\chi^2(\T q \| q) $ is the Pearson $ \chi^2 $-divergence.
Inconsistency-Aware Minimization: Improving Generalization with Unlabeled Data
Announce Type: new Abstract: Estimating the generalization gap and developing optimization methods that improve generalization are crucial for deep learning models, for both theoretical understanding and practical applications. Leveraging unlabeled data for these purposes offers significant advantages in real-world scenarios. This paper introduces a novel generalization measure, local inconsistency, derived from an information-geometric perspective on the parameter space of neural networks.