Home Knowledge Base KletterMix

KletterMix

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

KletterMix: Climbing Toward High-Quality German Pretraining Data

arXiv:2606.03773v1 Announce Type: new Abstract: High-quality pretraining data is a central ingredient in modern language models, but German-language resources remain far less developed than their English counterparts: they are often smaller, less carefully curated, weakly documented, and rarely validated through controlled training experiments. We introduce KletterMix, a high-quality German corpus for language model pretraining and annealing, designed as a reusable dataset artifact for the...

arXiv CS 7d ago