Hyperparameter Bias
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Beware of the Batch Size: Hyperparameter Bias in Evaluating LoRA
Announce Type: replace Abstract: Low-rank adaptation (LoRA) is a standard approach for fine-tuning large language models, yet its many variants report conflicting empirical gains, often on the same benchmarks. We show that these contradictions arise from a single overlooked factor: the batch size. When properly tuned, vanilla LoRA often matches the performance of more complex variants.
DISCO: Mitigating Bias in Deep Learning with Conditional Distance Correlation
arXiv:2506.11653v3 Announce Type: replace Abstract: Dataset bias often leads deep learning models to exploit spurious correlations instead of task-relevant signals. We introduce the Standard Anti-Causal Model (SAM), a unifying causal framework that characterizes bias mechanisms and yields a conditional independence criterion for causal stability. Building on this theory, we propose DISCO$_m$ and sDISCO, efficient and scalable estimators of conditional distance correlation that enable...
Primitive Subspaces Mediate Few-Shot Transfer in VLAs
Announce Type: new Abstract: Deploying vision-language-action (VLA) policies in industrial environments requires the ability to teach new tasks at low cost, a property current VLAs lack, since each new task requires fine-tuning. We investigate whether primitive-aware training produces a transferable artifact: a learned library of sub-skills that can be composed at inference time, conditioned on a small number of demonstrations, to perform tasks the policy was never trained on. We train two...
Structure-Preserving Correction Learning for Sparse Bayesian Inference in Brain Source Imaging
Announce Type: new Abstract: Classical sparse Type-II Bayesian methods for M/EEG brain imaging support joint estimation of source and noise hyperparameters, but rely on fixed iterative update rules. Although these updates are principled and interpretable, their dynamics cannot be adapted from data. We propose to learn the update mechanism itself while preserving the underlying Bayesian structure by unfolding a classical joint hyperparameter-learning solver into a trainable neural...
A hitchhiker's guide to Poisson gradient estimation
arXiv:2602.03896v2 Announce Type: replace-cross Abstract: Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: *Exponential Arrival Time* (EAT) simulation and *Gumbel-SoftMax* (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical guidance for practitioners.
Routing on the Stiefel Manifold: When Does Adaptive Subspace Selection Help for Cross-Domain EEG Decoding?
arXiv:2605.31043v1 Announce Type: cross Abstract: Cross-domain EEG decoding remains challenging despite advances in Riemannian deep learning: covariance matrices from different subjects occupy systematically distinct regions of the SPD manifold, yet existing domain adaptation methods either require target-domain calibration data or learn subject-specific components that cannot generalise across domains. We propose dynamic Stiefel routing: a pool of $K$ expert projection filters on the...
Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs
Announce Type: replace Abstract: Self-interpretation methods prompt language models to describe their own internal states, but remain unreliable due to hyperparameter sensitivity. We show that training lightweight adapters on interpretability artifacts, while keeping the LM entirely frozen, yields reliable self-interpretation across tasks and model families. A scalar affine adapter with just $d_\text{model}+1$ parameters suffices: trained adapters generate sparse autoencoder feature labels...
Human-Like Neural Nets by Catapulting
Human-like Neural Nets by Catapulting Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence. There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are...
Deep learning four decades of human migration
Abstract Human migration is a fundamental driver of global demographic change, shaping population structure, labour markets and social policy across countries1,2,3. Although long-term migration patterns are often linked to economic development4, they can shift rapidly in response to shocks such as conflict, environmental crises and political change5. Despite its importance, migration remains difficult to measure consistently: existing data are sparse, concentrated in high-income settings and...