Home › Business & Finance › Mining Useful General Data for Low-Resource Domain Adaptation

Business & Finance

Mining Useful General Data for Low-Resource Domain Adaptation

arXiv CS Monday 08 June 2026, 04:00 UTC By Pingjie Wang, Hongcheng Liu, Yusheng Liao, Ziqing Fan, Yaxin Du, Shuo Tang, Yanfeng Wang, Yu Wang 1 min read

Key Points

arXiv:2511.07380v2 Announce Type: replace Abstract: Adapting large language models (LLMs) to low-resource domains remains challenging due to the scarcity of domain-specific data. While in-domain data is limited, there exists a vast amount of general-domain data that shares similar question-answer formats and reasoning patterns with domain tasks. This observation raises an important question: can useful general-domain data be mined to improve low-resource domain adaptation? Our initial findings show that general-domain chain-of-thought data contains useful auxiliary signals for domain adaptation, even without careful selection. This observation motivates a new paradigm for domain adaptation beyond exclusive reliance on domain-specific data. To systematically identify the most beneficial general-domain samples, we propose NTK-Selector, motivated by the Neural Tangent Kernel's ability to capture alignment in training dynamics. Since directly applying NTK to pretrained LLMs is impractical, we introduce a Jacobian-free NTK approximation and empirically demonstrate stable NTK-like behavior during fine-tuning. Extensive experiments across medical, financial, legal, and psychological domains demonstrate that NTK-Selector consistently outperforms domain-only fine-tuning and existing data selection baselines. In particular, NTK-Selector achieves gains of +8.7 and +5.1 points on Llama3-8B-Instruct and Qwen3-8B, respectively, compared to only +0.8 and +0.9 points from domain-only fine-tuning.

NTK-Selector (ORG) the Neural Tangent Kernel's (ORG) NTK (ORG)

Originally published by arXiv CS Read original →

Mining Useful General Data for Low-Resource Domain Adaptation

Related Stories

Warburg CEO Calls IPO Market ‘Broken’ Even Amid Giant Offerings

'Partners and friends’: Trade and defence top of agenda at EU-South Korea summit

Trump signs $70 billion immigration funding bill after months of delay

Pay what you wish: the restaurant where customers can eat for free – if their conscience lets them