Home Knowledge Base the Fisher Hessian

the Fisher Hessian

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

On the Superlinear Relationship between SGD Noise Covariance and Loss Landscape Curvature

Announce Type: replace Abstract: Stochastic Gradient Descent (SGD) introduces anisotropic noise that is correlated with the local curvature of the loss landscape, thereby biasing optimization toward flat minima. Prior work often assumes an equivalence between the Fisher Information Matrix and the Hessian for negative log-likelihood losses, leading to the claim that the SGD noise covariance $\mathbf{C}$ is proportional to the Hessian $\mathbf{H}$. We show that this assumption holds only under...

arXiv CS 1d ago

The macroscopic Kaehler metric of Geometric Thermodynamics versus the microscopic one on the Event Manifold: Exact Partition Functions on CV manifolds. Extended Souriau temperatures and spontaneous magnetizations

arXiv:2606.09438v1 Announce Type: cross Abstract: In this paper we clarify the relation between Geometric Thermodynamics and Information Geometry based on the Fisher matrix. On the macroscopic odd-dimensional contact manifold of thermodynamic variables, we introduce for the first time a metric, whose pull-back on the isoentropic symplectic submanifolds transverse to the Reeb field is K\"ahlerian. The pull-back of such metric on equilibrium states, that are lagrangian submanifolds, is the...

arXiv CS 1d ago

Relaxation Kernel, Spectral Dissipation, and Global Convergence of Blahut--Arimoto Dynamics

arXiv:2604.25106v3 Announce Type: replace Abstract: We develop a spectral theory for continuous- and discrete-time Blahut--Arimoto (BA) dynamics, centered on the relaxation kernel $ \G = \E_p[K^*_X \otimes K^*_X] $. Five main results are established. Along the continuous-time BA flow, the free energy satisfies the exact $ \chi^2 $-dissipation identity $ \dot F_\beta = -\D(q) $, where $ \D(q)=\chi^2(\T q \| q) $ is the Pearson $ \chi^2 $-divergence.

arXiv CS 8d ago

Inconsistency-Aware Minimization: Improving Generalization with Unlabeled Data

Announce Type: new Abstract: Estimating the generalization gap and developing optimization methods that improve generalization are crucial for deep learning models, for both theoretical understanding and practical applications. Leveraging unlabeled data for these purposes offers significant advantages in real-world scenarios. This paper introduces a novel generalization measure, local inconsistency, derived from an information-geometric perspective on the parameter space of neural networks.

arXiv CS 9d ago