GD
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Uniform Stability and Generalization Error of GD and SGD on Fixed-Point Parameters
Announce Type: new Abstract: We analyze generalization error, uniform stability, and uniform argument stability of gradient descent (GD) and stochastic gradient descent (SGD) over discrete parameter spaces, where each update involves deterministic or stochastic rounding. We show that deterministic rounding degrades the generalization error of GD on convex, Lipschitz, and smooth loss functions, increasing the rate from $O(T/n)$ to $O(T/\sqrt{n})$, and establish matching lower bounds. We...
GD-MIL: Grade-Disentangled Multiple Instance Learning for Multimodal Biochemical Recurrence Prediction in Prostate Cancer
arXiv:2606.09453v1 Announce Type: new Abstract: Biochemical recurrence (BCR) after radical prostatectomy is a critical endpoint in prostate cancer, yet risk stratification relies almost entirely on variables dominated by Gleason grade. Whether H&E whole slide images (WSIs) carry prognostic signal beyond grade, and whether multiple instance learning (MIL) can recover it, remains unsettled.
Optimal Rates for Generalization of Gradient Descent for Deep ReLU Classification
arXiv:2510.02779v4 Announce Type: replace Abstract: Recent advances have significantly improved our understanding of the generalization performance of gradient descent (GD) methods in deep neural networks. A natural and fundamental question is whether GD can achieve generalization rates comparable to the minimax optimal rates established in the kernel setting. Existing results either yield suboptimal rates of $O(1/\sqrt{n})$, or focus on networks with smooth activation functions, incurring...
Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning
arXiv:2602.02431v2 Announce Type: replace-cross Abstract: It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. While this phenomenon has been extensively studied in linear regression, the benefit of multi-pass gradient descent (GD, which reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) is not well-understood in nonlinear and non-convex settings, except for a...
'Leaving my fate in the hands of Constitution': CJP founder heads to India
NEW DELHI: Cockroach Janta Party founder Abhijeet Dipke on Friday said he was on his way to India ahead of his proposed protest at Delhi's Jantar Mantar. He plans to demand the resignation of union education minister Dharmendra Pradhan over alleged irregularities in several national-level examinations, including NEET, CUET, CBSE and SSC GD. Sharing an update on X, Dipke wrote: "On my way to India.
Muon in Associative Memory Learning: Training Dynamics and Scaling Laws
arXiv:2602.05725v3 Announce Type: replace Abstract: Muon updates matrix parameters via the matrix sign of the gradient and has shown strong empirical gains, yet its dynamics and scaling behavior remain unclear in theory. We study Muon in a linear associative memory model with softmax retrieval and a hierarchical frequency spectrum over query-answer pairs, with and without label noise. In this setting, we show that Gradient Descent (GD) learns frequency components at highly imbalanced rates,...
Flatland: The Adventures of Gradient Descent with Large Step Sizes
Announce Type: new Abstract: The training of neural networks often entails objective functions that are not globally $L$-smooth. For these functions, it is both theoretically and practically difficult to reply to the question: what is the largest possible step size that ensures the convergence of gradient descent (GD)? We address this longstanding open question in deep learning by providing a unifying definition of "large" step sizes that requires only local Lipschitz (or even H\"older)...
Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway
new Abstract: Recent analyses of multi-pathway Deep Linear Networks use Gradient Flow to predict a "winner-takes-all" specialization in which path symmetry breaks and each feature concentrates in a single pathway. In this work, we show that discrete Gradient Descent (GD) with a large step size tells a different story. We prove that single-path solutions are sharp minima, whereas distributing signals across pathways reduces sharpness by a factor that decreases with both the number of pathways...
Gallantry awards: Prez Murmu honours 51 winners; Gaganyaan astronaut among recipients
Gallantry awards: President Murmu honours 51 winners; Gaganyaan astronaut among Kirti Chakra recipients NEW DELHI: President Droupadi Murmu on Monday conferred 07 Kirti Chakras, including two posthumous, 15 Vir Chakras, including three posthumous and 29 Shaurya Chakras, including one posthumous, to personnel of the Armed Forces, Central Armed Police Forces and State/UT Police during Phase-I of the Defence Investiture Ceremony 2026 held at Rashtrapati Bhawan in New Delhi. The awards were...
The intracies of modern camera lens repair (2024)
Sigma 45mm f/2.8 Lens Repair & Analysis [05.12.24] I have a camera gear collection problem and as part of my personal 12 step plan, I restrict myself from purchasing functioning lenses. This sounds illogical, and it frankly is, but it's very hard for me to resist heavily discounted lenses.