DIGITAL HUMANITIES INITIATIVE & SLAVIC LANGUAGES – "Using Machine Learning to Study a Large Collection of Russian Diaries"

This talk explores how machine learning, specifically transformer-based large language models (LLMs), can analyze an extensive collection of Russian historical diaries (1800-2018). LLMs enable computational methods like semantic text similarity and clustering by creating numerical representations of the texts. These methods can reveal significant topics and subjects within the diaries, such as prices and weather, offering new possibilities for digital scholarship. Drawing on John Unsworth’s concept of scholarly primitives, especially comparison, the talk will evaluate the potential of semantic text similarity methods and their challenges for research in the digital humanities.
Andy Janco (Princeton University) holds a PhD in Russian history from the University of Chicago and a master’s in library and information science from the University of Illinois at Urbana-Champaign. He is the co-director of an NEH Institute for Advanced Topics in the Digital Humanities at Princeton University in partnership with the Library of Congress Labs and the European Union’s Digital Research Infrastructure for the Arts and Humanities (DARIAH).
This event is sponsored by the IHGC's Digital Humanities Initiative, the Department of Slavic Languages & Literatures, and the Center for Russian, East European, and Eurasian Studies.