Aurélien Pellet

PhD Student

Team

Digital Methods in the Humanities and Social Sciences

Campus

Paris

Contact

aurelien.pellet@epita.fr

Research Area

Retrieval-Augmented Generation (RAG)
Natural Language Processing (NLP)
Digital Humanities
Evaluation Methodologies

Me in brief:

  • PhD Student in Artificial Intelligence and Digital Humanities at the LRE.
  • Engineer in Data Science with a background in Applied Mathematics and Probability (Université Paris Cité, UPMC).
  • My research focuses on adapting Retrieval-Augmented Generation (RAG) systems for historical corpora.
  • I am the co-author of the HistoriQA-ThirdRepublic dataset for multi-hop reasoning in historical research.

Short Bio

I hold a Master’s degree in Random Modeling and Data Science (M2MO) from Université Paris Cité and a Master’s degree in Probability and Random Models from Sorbonne University (formerly UPMC). After working as a Research Engineer at the LRE and as a Data Scientist and Pedagogical Manager for the MSc Data & IA at Epitech, I began my PhD in September 2025.

I am conducting my doctoral research under the direction of Laurent Romary (Inria) and the supervision of Julien Perez (EPITA), and Marie Puren (EPITA). My thesis, titled Contributions to Retrieval-Augmented Generation and Application to Historical Research, explores the intersection of Natural Language Processing (NLP) and Digital Humanities.

My work specifically addresses the challenges of applying Large Language Models (LLMs) to specialized historical documents. I have contributed to the development of methods for multi-hop question answering and the evaluation of RAG architectures using parliamentary debates and press articles from the French Third Republic.

Research

Research Interests

My research focuses on Artificial Intelligence applied to the Humanities, with a particular emphasis on overcoming the limitations of generic LLMs when dealing with long, heterogeneous, and historical contexts. My current work includes:

  • Retrieval-Augmented Generation (RAG): Improving retrieval and generation strategies for long-context documents and domain-specific corpora.
  • LLM Evaluation: Developing protocols for evaluating hallucination, factual consistency, and reasoning capabilities in AI models (LLM-as-a-judge).
  • Digital History: Automatic question generation and information extraction from French parliamentary debates and historical newspapers.

Publications

A list of my publications is available on:

Teaching

I am involved in pedagogical engineering and teaching at Epitech, specifically for the MSc Data & IA program.

MSc Data & IA

  • Data Visualization: Conception of the training program and project supervision.
  • AI & Big Data Projects: Supervision of student projects focusing on Artificial Intelligence and data processing.
  • Educational Data Analysis: Supervision of cohort studies and analysis of educational data.