Generative Models in Manuscript Studies: From Simulated Evolution to Digital Restoration

Abstract

The seminar will explore the use of generative networks to create artificial manuscripts and scripts. It will highlight how these models assist in restoring damaged documents and simulating paleographic evolution. We will focus on two use cases: The first concerns the Turin manuscript, a burned document that is difficult to read and analyze using computational methods. We propose a hypothetical visual reconstruction and the enhancement of layout analysis models through image-to-image translation techniques. The second involves the generation of synthetic handwritten data, designed both to improve Handwritten Text Recognition (HTR) performance and to explore the historical evolution of the Armenian script by simulating the transformation of its scripts.

Chahan Vidal-Gorène is a lecturer and researcher at the École nationale des chartes-PSL, and the director of the Digital Humanities Master's program at PSL University. His research focuses on computational paleography and the analysis of under-resourced languages. He is a member of the Agence Nationale de la Recherche project DALiH and the DISTAM consortium for data creation and NLP model development for non-Latin languages (mainly Armenian, Arabic and Chinese). He is the CEO and founder of Calfa, a startup specializing in document analysis and information extraction for non-western scripts.

Additional Information

The seminar will be given in 🇫🇷 French 🇫🇷. You can take part in the LRE Meeting Room either online or in person.