Home Knowledge Base MC-PDD

MC-PDD

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

MC-PDD: Masked Corpus-Level Pretraining Data Detection for Black-Box Large Language Models

arXiv:2606.07996v1 Announce Type: new Abstract: Pretraining is fundamental to the development of Large Language Models (LLMs), yet the opacity of pretraining data complicates model analysis and raises ethical, legal, and fairness concerns. Detecting whether specific datasets were used during pretraining is, therefore, critical. Existing state-of-the-art methods typically rely on access to model probability distributions, making them unsuitable for closed-source LLMs that provide only...

arXiv CS 1d ago