Home Science Breaking the Scale Barrier: One-Shot Knowledge Transfer...
Science

Breaking the Scale Barrier: One-Shot Knowledge Transfer via Frequency Transform

Key Points

arXiv:2603.07523v3 Announce Type: replace Abstract: Transferring knowledge by fine-tuning large-scale pre-trained networks has become a standard paradigm for downstream tasks, yet the knowledge of a pre-trained model is tightly coupled with monolithic architecture, which restricts flexible reuse across models of varying scales. In response to this challenge, recent approaches typically resort to either parameter selection, which fails to capture the interdependent structure of this...

arXiv:2603.07523v3 Announce Type: replace Abstract: Transferring knowledge by fine-tuning large-scale pre-trained networks has become a standard paradigm for downstream tasks, yet the knowledge of a pre-trained model is tightly coupled with monolithic architecture, which restricts flexible reuse across models of varying scales. In response to this challenge, recent approaches typically resort to either parameter selection, which fails to capture the interdependent structure of this knowledge, or parameter prediction using generative models that depend on impractical access to large network collections. In this paper, we identify the low-frequency components of model weights as the concrete carrier of foundational, task-agnostic knowledge, its ``learngene", and validate this by demonstrating its efficient inheritance by downstream models and tasks. Based on this insight, we propose FRONT (FRequency dOmain kNowledge Transfer), a novel framework that uses the Discrete Cosine Transform (DCT) to isolate the low-frequency ``learngene". This learngene can be seamlessly adapted to initialize models of arbitrary size via simple truncation or padding, a process that is entirely training-free. For enhanced performance, we propose an optional low-cost refinement process that introduces a spectral regularizer to further improve the learngene's transferability. Extensive experiments demonstrate that FRONT achieves the state-of-the-art performance, accelerates convergence by up to $15\times$ in vision tasks, and reduces training FLOPs by an average of 40.5% in language tasks. Code is available at https://github.com/LUcy0505/FRONT.
Frequency Transform arXiv:2603.07523v3 Announce Type: (ORG) FRONT (ORG)
Originally published by arXiv CS Read original →