Home Health KliniskVestBERT: BERT Model Specialised to Norwegian...
Health

KliniskVestBERT: BERT Model Specialised to Norwegian Clinical Texts

Key Points

arXiv:2606.01904v2 Announce Type: replace Abstract: The increasing application of Natural Language Processing (NLP) in healthcare demands language models specifically attuned to the complexities of clinical language. This work introduces KliniskVestBERT, a suite of three BERT-based encoder models pre-trained on a substantial corpus of real-world, de-identified Norwegian clinical texts from Helse Vest. We continue pretraining existing language models Nb-BERT-large, NorBERT3-large, and...

arXiv:2606.01904v2 Announce Type: replace Abstract: The increasing application of Natural Language Processing (NLP) in healthcare demands language models specifically attuned to the complexities of clinical language. This work introduces KliniskVestBERT, a suite of three BERT-based encoder models pre-trained on a substantial corpus of real-world, de-identified Norwegian clinical texts from Helse Vest. We continue pretraining existing language models Nb-BERT-large, NorBERT3-large, and ModernBERT on our specialized clinical dataset. This dataset is based on a representative population of Helse Vest patients. The included document types are carefully curated to encompass a broad clinical spectrum in bokm{\aa}l and nynorsk including discharge summaries, surgical reports, nursing notes etc. ensuring comprehensive representation of the linguistic landscape within Norwegian healthcare settings. Evaluation on three synthtetic Norwegian clinical benchmark datasets and two real-world problems demonstrates that each of our clinically specialized models consistently outperforms their baseline counterparts, highlighting the significant benefit of domain-specific pre-training for NLP tasks within the clinical domain. The project was a joint effort by all Helse Vest entities (Helse Bergen, Helse Fonna, Helse F{\o}rde and Helse Stavanger) with DIPS under the project lead of Helse Vest ICT.
BERT Model Specialised (ORG) Norwegian (ORG) Natural Language Processing (ORG) NLP (ORG) KliniskVestBERT (ORG) BERT (ORG) Helse Vest (LOCATION) Helse Bergen (LOCATION) Helse Fonna (PERSON) Helse F{\o}rde (PERSON) Helse Stavanger (LOCATION) Helse Vest ICT (ORG)
Originally published by arXiv CS Read original →