From Text to Source: Results in Detecting Large Language Model-Generated Content

Wissam Antoun; Djamé Seddah; Benoît Sagot

Communication Dans Un Congrès Année : 2023

From Text to Source: Results in Detecting Large Language Model-Generated Content

(1) , (1) , (1)

Wissam Antoun

Fonction : Auteur
PersonId : 1214021
IdHAL : wissam-antoun
ORCID : 0000-0001-8021-5834

Automatic Language Modelling and ANAlysis & Computational Humanities

Djamé Seddah

Fonction : Auteur
PersonId : 11545
IdHAL : djameseddah
IdRef : 086185136

Automatic Language Modelling and ANAlysis & Computational Humanities

Benoît Sagot

Fonction : Auteur
PersonId : 1461
IdHAL : bsagot
ORCID : 0000-0002-0107-8526
IdRef : 177454229

Automatic Language Modelling and ANAlysis & Computational Humanities

Résumé

The widespread use of Large Language Models (LLMs), celebrated for their ability to generate human-like text, has raised concerns about misinformation and ethical implications. Addressing these concerns necessitates the development of robust methods to detect and attribute text generated by LLMs. This paper investigates "Cross-Model Detection," by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The study comprehensively explores various LLM sizes and families, and assesses the impact of conversational fine-tuning techniques, quantization, and watermarking on classifier generalization. The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection. Our results reveal several key findings: a clear inverse relationship between classifier effectiveness and model size, with larger LLMs being more challenging to detect, especially when the classifier is trained on data from smaller models. Training on data from similarly sized LLMs can improve detection performance from larger models but may lead to decreased performance when dealing with smaller models. Additionally, model attribution experiments show promising results in identifying source models and model families, highlighting detectable signatures in LLM-generated text, with particularly remarkable outcomes in watermarking detection, while no detectable signatures of quantization were observed. Overall, our study contributes valuable insights into the interplay of model size, family, and training data in LLM detection and attribution.

Domaines

Informatique et langage [cs.CL]

Fichier principal

jqjkvxpwgxhvpqdrgmkzhnygcyyxnqfx.pdf (615.4 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Wissam Antoun : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-04264050

Soumis le : mercredi 27 mars 2024-12:12:01

Dernière modification le : mercredi 27 mars 2024-13:34:49

Dates et versions

hal-04264050 , version 1 (27-03-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04264050 , version 1
ARXIV : 2309.13322

Citer

Wissam Antoun, Djamé Seddah, Benoît Sagot. From Text to Source: Results in Detecting Large Language Model-Generated Content. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, Torino, Italy. ⟨hal-04264050⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2 ANR PRAIRIE-IA

43 Consultations

2 Téléchargements

From Text to Source: Results in Detecting Large Language Model-Generated Content

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager