Abstract

There are many book databases that support research in natural language processing or digital humanities. However, most of them do not allow for linking books to the behaviors of their readership. Were these books read? Enjoyed? Reviewed? If so, by whom? Were they the subject of critical attention or documentation? Being able to associate the content of popular books with their audience would allow us to study them not only as literary objects, but also as witnesses to and agents of social and cultural representations.

To address this need, the project presented here aims to aggregate several data sources—including underground libraries, OpenLibrary, Goodreads, and Wikidata—in order to build a multilingual catalog of popular literature.

This is a work in progress, expected to be completed by June 2025. I will present my methodological choices, as well as the characteristics and biases of the resulting database, in order to open a discussion on its potential uses.

Additional Information

The seminar will be given in 🇫🇷 French 🇫🇷. You can take part in the LRE Meeting Room either online or in person.