Home › Knowledge Base › Linear Regression

Linear Regression

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness, and Safety

arXiv:2605.17126v2 Announce Type: replace-cross Abstract: We study the multi-task linear regression problem in the presence of contaminated tasks. We address the setting where the unknown parameters of a majority of tasks are close in the $\ell_2$-norm, while a fraction of tasks are arbitrary outliers. Existing theoretical frameworks for this problem rely heavily on the assumption that the empirical second moment of each task has a minimum eigenvalue bounded away from zero (order $\Omega(1)$).

arXiv CS 9d ago

Kling-Gupta linear regression

arXiv:2606.09391v1 Announce Type: cross Abstract: Although the Kling-Gupta efficiency ($\mathrm{KGE}$) is widely adopted for model evaluation in hydrology, its properties as a statistical estimator remain unexplored. Investigating these properties is necessary because parameter estimation and forecast evaluation are inherently linked. To address this, we formalize the negatively oriented Kling-Gupta loss $L_\mathrm{KG} = (1 - \mathrm{KGE})^2$ within an extremum estimation framework...

arXiv Physics 1d ago

What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression

arXiv:2606.01292v1 Announce Type: new Abstract: Teacher-Student Knowledge Transfer (KT) is ubiquitous in modern machine learning, ranging from classical model compression via Knowledge Distillation (KD) to the emergent phenomenon of Weak-to-Strong (W2S) generalization. While existing studies offer isolated insights, a unified theoretical framework explaining the efficacy of KT across these disparate regimes remains lacking. In this work, we establish a unified spectral analysis of SGD...

arXiv CS 8d ago

Improving the Accuracy of Forensic Age Estimation Through Bias Reduction

Chronological age estimation can provide supporting information in forensic casework when traditional identification methods are limited. DNA methylation, a stable epigenetic mark, has emerged as a promising tool for predicting chronological age from trace samples. However, many existing age estimation models rely on linear regression approaches, which often yield biased prediction errors across the age distribution (i.e. model residuals show a significant age dependence).

bioRxiv 7d ago

SIRT7 regulates dosage compensation and safeguards the female X chromosome

Abstract Sirtuins are deacetylases implicated in stress responses and longevity in mammals1,2. Although their differential impact on disease for the two sexes has been noted3,4,5,6,7, the underlying reasons are unclear. Here, using Sirt7 as a model in mice, we examine the mechanisms leading to sex differences and find that Sirt7−/− female mice have decreased fitness throughout their lifespan.

Nature 19h ago

The Fast Mixing Mechanism for Differential Privacy

Announce Type: new Abstract: Randomized sketching is a central tool for compressing large-scale optimization problems while preserving accuracy. In particular, sketches that are based on structured matrices, such as the Hadamard matrix, can be applied efficiently and often yield solutions that approximate those of the original problem at much lower computational cost. In differential privacy (DP), Gaussian sketching has been used to solve DP linear regression, beginning with...

arXiv CS 9d ago

To Grok Grokking: Provable Grokking in Ridge Regression

arXiv:2601.19791v3 Announce Type: replace Abstract: We study grokking, the onset of generalization long after overfitting, in a classical ridge regression setting. We prove end-to-end grokking results for learning over-parameterized linear regression models using gradient descent with weight decay. Specifically, we prove that the following stages occur: (i) the model overfits the training data early during training; (ii) poor generalization persists long after overfitting has manifested; and...

arXiv CS 9d ago

Convergence of Steepest Descent and Adam under Non-Uniform Smoothness

arXiv:2605.30648v1 Announce Type: new Abstract: Recent work has analyzed the convergence of first-order methods under non-uniform smoothness assumptions that better model the loss landscape in machine learning tasks. We generalize this assumption to objectives whose curvature is an affine function of the objective value. This property is satisfied by a broad class of problems, including logistic regression, generalized linear models with a logistic link function, softmax policy gradient in...

arXiv CS 9d ago

An Asymptotic Theory of Chain-of-Thought in In-Context Learning

arXiv:2606.03217v1 Announce Type: cross Abstract: Chain-of-thought (CoT) reasoning has become a widely used mechanism for eliciting multi-step reasoning in large language models by generating intermediate reasoning steps at inference time. Yet the scaling behavior of generalization with CoT depth remains poorly understood. To address this question, we study a theoretically solvable model of CoT for in-context weight prediction in linear regression, where test-time reasoning is represented as...

arXiv CS 7d ago

Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning

arXiv:2602.02431v2 Announce Type: replace-cross Abstract: It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. While this phenomenon has been extensively studied in linear regression, the benefit of multi-pass gradient descent (GD, which reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) is not well-understood in nonlinear and non-convex settings, except for a...

arXiv CS 1d ago