Abstract

In the context of the growing use of large language models (LLMs), the need for effective and automatic source retrieval has become essential—especially when dealing with historical documents. The ability of LLMs to identify relevant sources is no longer just a link in a chain whose end goal is answer generation; it now stands as a core analytical challenge in its own right, deserving dedicated evaluation. Which strategies, models, and parameters provide historians with the best tools for exploring large and noisy corpora? This article offers a first attempt at evaluating the retriever component within a RAG framework applied to the parliamentary debates of the French Third Republic.

Authors

Aurélien Pellet

Team - Digital Methods in the Humanities and Social Sciences

Julien Perez

Marie Puren

Team - Digital Methods in the Humanities and Social Sciences

Quick Links

Paper

Cite