Cross Paraphrastic Invariance Learning for Hallucination Detection

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Shanshan Lin, Dongsheng Hong, Sibo Ju, Chao Chen, Sihong Xie, Xiangwen Liao 1 min read

Key Points

arXiv:2606.08157v1 Announce Type: new Abstract: Large language models (LLMs) frequently generate hallucinations, which are unsupported by a source document. To avoid costly LLM-as-evaluator pipelines and the heavy annotation demands of existing classifiers, we propose CPIL (Cross Paraphrastic Invariance Learning), a two-stage Siamese framework that maximizes the utility of existing labeled data. Concretely, CPIL constructs informative training pairs by: (i) generating paraphrastic views of each document-claim example as positives, and explicitly aligning their representations to enforce invariance to surface form; and (ii) mining same-document, opposite-label pairs as hard negatives to sharpen document-sensitive decision boundaries. Then CPIL conduct a two-stage model training: Stage 1 performs contrastive pretraining to learn a paraphrase-invariant, grounding-aware embedding space; and Stage 2 attaches a lightweight classifier for binary groundedness. On the LLM-AggreFact benchmark (11 tasks), CPIL surpasses strong baselines concerning F1 scores with only ~1% labeled data, showing its prediction superiority and label efficiency.

LLM (ORG) Cross Paraphrastic Invariance Learning (ORG) Siamese (ORG) F1 (ORG)

Originally published by arXiv CS Read original →

Cross Paraphrastic Invariance Learning for Hallucination Detection

Related Stories

Scientists were excited about a blood test for many cancers — but it failed a big trial. Here's what to know.

After NSIL’s PPP bid, IN-SPACe opens LVM-3 to private sector with ToT push

NASA chief defends all-male Artemis 3 astronaut crew amid backlash: 'I don't think anyone should be reading into this'

SpaceX courts Australian investors as government warns Elon Musk risk