Home Knowledge Base Efficient Hyperparameter Optimization

Efficient Hyperparameter Optimization

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Efficient Hyperparameter Optimization for LLM Reinforcement Learning

arXiv:2606.03073v1 Announce Type: new Abstract: Reinforcement learning (RL) for large language models (LLMs) is highly sensitive to hyperparameter configurations, making hyperparameter optimization (HPO) essential yet computationally expensive. Existing multi-fidelity HPO methods remain inefficient for LLM RL due to the massive model scale and resource-intensive training cycles. In this paper, we propose Joint Fidelity Hyperparameter Optimization (JF-HPO), which simultaneously adapts both...

arXiv CS 7d ago

Provably Reduced Sample Cost in Prior-Guided Hyperparameter Optimization

arXiv:2606.04866v1 Announce Type: new Abstract: Large-scale hyperparameter optimization (HPO) in automated machine learning (AutoML) consumes substantial computational resources, raising growing concerns about scalability and energy efficiency. Existing methods use prior information heuristically to accelerate both black-box and multi-fidelity settings, but they lack a characterization of how prior informativeness quantitatively reduces sample complexity. In this work, we provide the first...

arXiv CS 6d ago

Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training

arXiv:2606.05610v1 Announce Type: new Abstract: The efficacy of continued pre-training for Large Language Models (LLMs) hinges upon hyperparameter configurations, such as learning rate and batch size. However, current practices often rely on heuristics or grid searches, leading to training instability and excessive costs. In this work, we first empirically discover that optimal hyperparameters follow stable and predictable scaling laws throughout the continued pre-training process.

arXiv CS 5d ago

ZOAF: Towards Efficient Zeroth-Order Optimization for Analog/RF Circuit Design

Announce Type: new Abstract: Circuit optimization is an indispensable step in analog/RF IC design. Classical fast gradient-based optimization methods are typically infeasible due to lack of access to simulator source code and the technical barriers to implementing adjoint methods. Therefore, surrogate-based black-box optimization is widely used in practice; however, it can be costly to build and sensitive to hyperparameters, whereas population heuristics often suffer from slow convergence...

arXiv CS 7d ago

Iterated Population Based Training with Task-Agnostic Restarts

Announce Type: replace Abstract: Hyperparameter Optimization (HPO) can lift the burden of tuning hyperparameters (HPs) of neural networks. HPO algorithms from the Population Based Training (PBT) family are efficient thanks to dynamically adjusting HPs every few steps of the weight optimization.

arXiv CS 8d ago

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

Announce Type: new Abstract: Retrieval systems underpin modern AI applications -- spanning visual search, recommendation engines, and multi-modal question answering. Modern multi-stage retrieval systems require the joint optimization of highly coupled parameters, yet traditional hyperparameter optimization (HPO) methods -- including Tree-structured Parzen Estimators (TPE) and Gaussian Process Bayesian Optimization -- rely on an independence assumption that fundamentally prevents them from...

arXiv CS 5d ago

S$^3$LDBO: A Snapshot Single-Loop Algorithm for Decentralized Bilevel Optimization

arXiv:2605.31311v1 Announce Type: cross Abstract: Networked AI systems increasingly rely on multiple agents that collaboratively learn and adapt models over communication networks. In such systems, bilevel formulations naturally arise in hyperparameter optimization, data cleaning, and meta-learning, but the repeated evaluation of gradients, Jacobians, and Hessians can impose a substantial computational burden on individual agents. To address this challenge, we propose Snapshot-SLDBO...

arXiv CS 9d ago

CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search

Announce Type: new Abstract: Deploying Large Language Models (LLMs) in practice incurs substantial memory and computational costs. Post-training pruning (PTP) is an effective approach to reducing these costs by removing weights without additional training. Among existing methods, RIA introduces relative importance scores normalized by row and column sums, achieving state-of-the-art accuracy.

arXiv CS 8d ago

Rethinking Evaluation Paradigms in IBP-based Certified Training

Announce Type: new Abstract: Deep neural networks achieve strong performance on many supervised learning tasks but remain vulnerable to adversarial perturbations. Neural network verification provides mathematically rigorous robustness guarantees, yet at substantial computational cost.

arXiv CS 8d ago

Accelerating Bidiagonalization of Banded Matrices through Memory-Aware Bulge-Chasing on GPUs

arXiv:2510.12705v3 Announce Type: replace Abstract: The reduction of a banded matrix to bidiagonal form is a critical step in the calculation of Singular Values, a cornerstone of scientific computing and AI. Although inherently parallel, this step has traditionally been considered unsuitable for GPUs due to its memory-bound nature. However, recent advances in GPU architectures, such as increased L1 memory per Streaming Multiprocessor or Compute Unit and larger L2 caches, have shifted this...

arXiv CS 1d ago