Home Knowledge Base Hyperparameter Importance

Hyperparameter Importance

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Conditional PED-ANOVA: Hyperparameter Importance in Hierarchical & Dynamic Search Spaces

arXiv:2601.20800v3 Announce Type: replace Abstract: We propose conditional PED-ANOVA (condPED-ANOVA), a principled framework for estimating hyperparameter importance (HPI) in conditional search spaces, where the presence or domain of a hyperparameter can depend on other hyperparameters. Although the original PED-ANOVA provides a fast and efficient way to estimate HPI within the top-performing regions of the search space, it assumes a fixed, unconditional search space and therefore cannot...

arXiv CS 6d ago

Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise

arXiv:2605.18528v2 Announce Type: replace-cross Abstract: A growing lesson from neural network optimization is that optimizer design should respect how the model is parametrized. Scale-invariant methods become important because their normalized layerwise updates can not only support hyperparameter transfer across model sizes but exploit input-output matrix norm geometry. At the same time, stochastic gradient noises in deep learning are often far from sub-Gaussian and may exhibit heavy tails.

arXiv CS 8d ago

ShaplEIG: Bayesian Experimental Design for Shapley Value Estimation

arXiv:2606.02247v1 Announce Type: cross Abstract: Shapley values are a principled attribution measure widely used in interpretable machine learning, but their exact computation scales exponentially with the number of players, motivating a wide range of approximation methods based on value function evaluations of sampled coalitions. This raises the question of whether approximation accuracy can be improved by adaptively selecting coalitions for evaluation based on previous evaluations. This...

arXiv CS 8d ago

Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging

arXiv:2505.22934v2 Announce Type: replace Abstract: Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage. Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training. However, existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation.

arXiv CS 9d ago

Weight Decay Improves Language Model Plasticity

arXiv:2602.11137v2 Announce Type: replace Abstract: Large language models are typically trained in two broad phases: pretraining to produce a base model, followed by further training to improve downstream performance. However, hyperparameter optimization and scaling laws are studied primarily from the perspective of the base model's validation loss, overlooking a crucial model property: downstream adaptability. In this work, we study pretraining from the perspective of model plasticity, that...

arXiv CS 9d ago

Physics-constrained Gaussian Processes for Predicting Shockwave Hugoniot Curves

arXiv:2601.06655v2 Announce Type: replace Abstract: A physics-constrained Gaussian Process regression framework is developed for predicting shocked material states and their associated uncertainties along the Hugoniot curve using data from a small number of shockwave simulations. The proposed Gaussian process is constrained by the Rankine-Hugoniot jump conditions between the various shocked material states to construct a thermodynamically consistent covariance function. This leads to the...

arXiv CS 5d ago

Physics-constrained Gaussian Processes for Predicting Shockwave Hugoniot Curves

arXiv:2601.06655v2 Announce Type: replace-cross Abstract: A physics-constrained Gaussian Process regression framework is developed for predicting shocked material states and their associated uncertainties along the Hugoniot curve using data from a small number of shockwave simulations. The proposed Gaussian process is constrained by the Rankine-Hugoniot jump conditions between the various shocked material states to construct a thermodynamically consistent covariance function. This leads to...

arXiv Physics 5d ago

REFINE: Super-efficient 3D Gaussian Splatting Pruning via Rendering-Free Primitive Importance

new Abstract: Existing pruning methods for 3D Gaussian splatting (3DGS) suffer from either severe quality degradation or prohibitive computational overhead. In this paper, we propose REFINE, a highly accelerated 3DGS pruning framework centered on a novel rendering-free primitive importance metric. Our approach leverages an analytically approximated, rendering-aware Hessian field to quantify the expected perceptual error induced by the removal of individual primitives.

arXiv CS 1d ago

Iterated Population Based Training with Task-Agnostic Restarts

Announce Type: replace Abstract: Hyperparameter Optimization (HPO) can lift the burden of tuning hyperparameters (HPs) of neural networks. HPO algorithms from the Population Based Training (PBT) family are efficient thanks to dynamically adjusting HPs every few steps of the weight optimization.

arXiv CS 8d ago

CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search

Announce Type: new Abstract: Deploying Large Language Models (LLMs) in practice incurs substantial memory and computational costs. Post-training pruning (PTP) is an effective approach to reducing these costs by removing weights without additional training. Among existing methods, RIA introduces relative importance scores normalized by row and column sums, achieving state-of-the-art accuracy.

arXiv CS 8d ago