Hyperparameter Importance
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Conditional PED-ANOVA: Hyperparameter Importance in Hierarchical & Dynamic Search Spaces
arXiv:2601.20800v3 Announce Type: replace Abstract: We propose conditional PED-ANOVA (condPED-ANOVA), a principled framework for estimating hyperparameter importance (HPI) in conditional search spaces, where the presence or domain of a hyperparameter can depend on other hyperparameters. Although the original PED-ANOVA provides a fast and efficient way to estimate HPI within the top-performing regions of the search space, it assumes a fixed, unconditional search space and therefore cannot...
Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise
arXiv:2605.18528v2 Announce Type: replace-cross Abstract: A growing lesson from neural network optimization is that optimizer design should respect how the model is parametrized. Scale-invariant methods become important because their normalized layerwise updates can not only support hyperparameter transfer across model sizes but exploit input-output matrix norm geometry. At the same time, stochastic gradient noises in deep learning are often far from sub-Gaussian and may exhibit heavy tails.
ShaplEIG: Bayesian Experimental Design for Shapley Value Estimation
arXiv:2606.02247v1 Announce Type: cross Abstract: Shapley values are a principled attribution measure widely used in interpretable machine learning, but their exact computation scales exponentially with the number of players, motivating a wide range of approximation methods based on value function evaluations of sampled coalitions. This raises the question of whether approximation accuracy can be improved by adaptively selecting coalitions for evaluation based on previous evaluations. This...
Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging
arXiv:2505.22934v2 Announce Type: replace Abstract: Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage. Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training. However, existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation.
Weight Decay Improves Language Model Plasticity
arXiv:2602.11137v2 Announce Type: replace Abstract: Large language models are typically trained in two broad phases: pretraining to produce a base model, followed by further training to improve downstream performance. However, hyperparameter optimization and scaling laws are studied primarily from the perspective of the base model's validation loss, overlooking a crucial model property: downstream adaptability. In this work, we study pretraining from the perspective of model plasticity, that...
Physics-constrained Gaussian Processes for Predicting Shockwave Hugoniot Curves
arXiv:2601.06655v2 Announce Type: replace Abstract: A physics-constrained Gaussian Process regression framework is developed for predicting shocked material states and their associated uncertainties along the Hugoniot curve using data from a small number of shockwave simulations. The proposed Gaussian process is constrained by the Rankine-Hugoniot jump conditions between the various shocked material states to construct a thermodynamically consistent covariance function. This leads to the...
Physics-constrained Gaussian Processes for Predicting Shockwave Hugoniot Curves
arXiv:2601.06655v2 Announce Type: replace-cross Abstract: A physics-constrained Gaussian Process regression framework is developed for predicting shocked material states and their associated uncertainties along the Hugoniot curve using data from a small number of shockwave simulations. The proposed Gaussian process is constrained by the Rankine-Hugoniot jump conditions between the various shocked material states to construct a thermodynamically consistent covariance function. This leads to...
REFINE: Super-efficient 3D Gaussian Splatting Pruning via Rendering-Free Primitive Importance
new Abstract: Existing pruning methods for 3D Gaussian splatting (3DGS) suffer from either severe quality degradation or prohibitive computational overhead. In this paper, we propose REFINE, a highly accelerated 3DGS pruning framework centered on a novel rendering-free primitive importance metric. Our approach leverages an analytically approximated, rendering-aware Hessian field to quantify the expected perceptual error induced by the removal of individual primitives.
Iterated Population Based Training with Task-Agnostic Restarts
Announce Type: replace Abstract: Hyperparameter Optimization (HPO) can lift the burden of tuning hyperparameters (HPs) of neural networks. HPO algorithms from the Population Based Training (PBT) family are efficient thanks to dynamically adjusting HPs every few steps of the weight optimization.
CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search
Announce Type: new Abstract: Deploying Large Language Models (LLMs) in practice incurs substantial memory and computational costs. Post-training pruning (PTP) is an effective approach to reducing these costs by removing weights without additional training. Among existing methods, RIA introduces relative importance scores normalized by row and column sums, achieving state-of-the-art accuracy.