- Computing Cultural Heritage
- Collections as Data
- Digital Humanities
Research Areas
- LIS 598 - Special Topics In Information And Library Science
Benjamin Charles Germain Lee is an incoming Assistant Professor in the Information School at the University of Washington (starting Autumn 2024), where he is starting the Lab for Computing Cultural Heritage. His research reimagines how we search and interpret cultural heritage collections using machine learning and computation. This research has three central goals: (1) developing large-scale search and discovery systems for digitized and born-digital collections; (2) leveraging these systems in order to advance research in the digital humanities and cultural heritage; (3) studying the ethical and sociotechnical implications of applying machine learning in this context.
Lee is currently a Kluge Fellow in Digital Studies at the Library of Congress, where he is working with the Library of Congress’s Web Archiving Team. He received his Ph.D. in Computer Science & Engineering from the University of Washington, supported by a National Science Foundation graduate research fellowship in machine learning. During his Ph.D., he developed Newspaper Navigator, a machine learning supported system that enables searching for text and images in 16.3 million historical newspaper articles. He has served as an Innovator in Residence at the Library of Congress, the inaugural Digital Humanities Associate Fellow at the United States Holocaust Memorial Museum, a Visiting Fellow in Harvard’s History Department, and a Richard and Ina Willner Memorial Fellow in the Stroum Center for Jewish Studies at the University of Washington.
Lee is a General Editor at Digital Humanities Quarterly. His public writing has appeared in WIRED, Gawker, Current Affairs, Jacobin, Real Life, GoldFlakePaint, Protean, and Bright Wall/Dark Room.
He is currently recruiting Ph.D. students for Autumn, 2024.
- Ph D, Computer Science & Engineering, University of Washington, 2023
- MS, Computer Science & Engineering, University of Washington, 2020
- BA, Astrophysics and Mathematics, Harvard College, 2017
- Best Digital Humanities Dataset - 2020 DH Awards, 2021
- Best Resource Paper Runner-up - CIKM, 2020
- summa cum laude - Harvard College, 2017
- Thomas T. Hoopes Prize, 2017
- John Harvard Scholar, 2015-2016
- Phi Beta Kappa, 2016
- Herchel Smith Harvard Undergraduate Science Research Fellow, 2015
- Philip Hofer Prize for Collecting Books or Art - Harvard University, 2014
- National Merit Scholarship, 2013
Publications and Contributions
Journal Article, Professional JournalLIMEADE: From AI Explanations to Advice Taking (2023)ACM Transactions on Interactive Intelligent Systems, Special Issue: “Human-Centered Explainable AI”, 13(4)
Journal Article, Academic JournalThe “Collections as ML Data” checklist for machine learning and cultural heritage (2023)Journal of the Association for Information Science and Technology
Journal Article, Professional JournalGrappling with the Scale of Born-Digital Government Publications: Toward Pipelines for Processing and Searching Millions of PDFs (2022)International Journal of Digital Humanities
Magazine/Trade PublicationManufacturing Nostalgia (2022)Current Affairs,
Book, Scholarly-NewThe Digital Humanities and the Ladino Press: Using Machine Learning to Extract and Analyze Visual Content in Historic Ladino Newspapers (2022)Jewish Studies in the Digital Age
Journal Article, Academic JournalTowards an Experimental Bibliography of Hemispheric Reconstruction Newspapers (2022)Criticism, 64(3, Article 15)
Commissioned ReportsA Landscape of Data Sources: Findings & Recommendations, A Report Commissioned by the Library of Congress (2021)Library of Congress Contract LCLBN20E0018
Journal Article, Professional JournalCompounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset (2021)Digital Humanities Quarterly, 15(4)
Conference PaperLayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis (2021)ICDAR 2021, pp. 131–146
Journal Article, Professional JournalMachine Learning and the Social Studies (2021)Social Education, 85(2)
Conference PaperNavigating the Mise-en-Page: Interpretive Machine Learning Approaches to the Visual Layouts of Multi-Ethnic Periodicals (2021)Computational Humanities Research (CHR) 2021, pp. 61
Magazine/Trade PublicationPutting the ‘Capitalism’ in ‘Surveillance Capitalism’ (2021)Current Affairs,
Magazine/Trade PublicationSpeaking for the Past (2021)Real Life
Journal Article, Public or Trade JournalAnne Frank’s Ghost All Around (2020)GoldFlakePaint
DemoNewspaper Navigator: Open Faceted Search for 1.5 Million Images (2020)Adjunct Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pp. 120–122
EuropeanaTech PublicationNewspaper Navigator: Putting Machine Learning in the Hands of Library Users (2020)EuropeanaTech Insight,
Conference Paper
Magazine/Trade PublicationThe Singularity Prophets (2020)Current Affairs,
Journal Article, Professional JournalMachine Learning, Template Matching, and the International Tracing Service Digital Archive: Automating the Retrieval of Death Certificate Reference Cards from 40 Million Document Scans (2019)Digital Scholarship in the Humanities, 34(3)
Journal Article, Professional JournalImproved Point-source Detection in Crowded Fields Using Probabilistic Cataloging (2017)The Astronomical Journal, 154(4)
Workshop PaperLine Detection in Binary Document Scans: A Case Study with the International Tracing Service Archives (2017)2017 IEEE International Conference on Big Data, pp. 2256-2261
Journal Article, Professional JournalGalaxy Redshifts from Discrete Optimization of Correlation Functions (2016)The Astronomical Journal, 152(6)
A Year of AI & Public Libraries
2025 Knight Library Leaders Conference - Miami, FL
Re-imagining Large-Scale Search & Discovery for Millions of Born-Digital Government Publications
Archives as Data Conference - New York, NY
#WhyWebArchiving: Preserving Internet Content for Research Use
Library of Congress & International Internet Preservation Consortium - Virtual
Computing on Cultural Heritage: Reports from an LC Labs Experiment
American Historical Association 2023 - Philadelphia, PA
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
Digital History Research Colloquium - Humboldt-Universität Berlin (Virtual)
#WhyWebArchiving: Preserving Internet Content for Research Use
Strategic Visioning Workshop for Digital Strategy at the Library of Congress - Virtual
A Computational Periodicals Unconference: Exploring New Opportunities for Critical and Collaborative Inquiry
DH Unbound 2022 - Virtual
Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset
Digital Humanities + iSchool (DHIS) Collective Meeting, University of Illinois Urbana-Champaign - Virtual
Compounded Mediation: Excavating the Newspaper Navigator Dataset
Fifth Annual GHI Conference on Digital Humanities and Digital History, German Historical Institute Washington - Virtual
How to Get Published in Academic Journals
Critical Digital Methods Institute CDMI Workshop Series, University of Toronto - Virtual
CCC Artificial Intelligence Roadmap Workshop 2 - Denver, CO
New Directions for Interdisciplinary Collaborations in Periodical Studies
Research Society for American Periodicals - Virtual
Newspaper Navigator: Hosting the Dataset and Deploying the Search Application
Designing Storage Architectures for Digital Collections 2022, Library of Congress - Virtual
Newspaper Navigator: Open Faceted Search for 1.5 Million Images
Computer Science & Engineering HCI Seminar, The University of Washington - Virtual
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
Digital Humanities 2022 - Virtual
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
Fantastic Futures 2021 - Virtual
Novel Machine Learning Methods for Computing Cultural Heritage: An Interdisciplinary Approach
Center for Digital Humanities, Princeton University - Virtual
Serving Researchers With Public Web Archive Datasets in the Cloud
IIPC Web Archiving Conference - Virtual
The Digital Humanities and the Ladino Press: Unlocking Historic Ladino Newspapers with Machine Learning
ucLADINO Conference 2022 - Virtual
Using Machine Learning to Extract and Analyze Advertisements in Historic Ladino Newspapers, 1890-1948
Studying Advertisements in pre-1939 Jewish Press: Methods and Challenges Workshop, University of Wrocław - Virtual
Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset
Digital History of Science Working Group, Consortium for the History of Science, Technology, and Medicine - Virtual
Cowboys, Computers, and Cartoons: Excavating and Explicating America’s Political Cartoons
Association for Documentary Editing Annual Meeting - Virtual
Data & Technologies
Collective Wisdom Workshop - Virtual
From Chronicling America to Newspaper Navigator: Improving Access to Historic Newspaper Photos at the Library of Congress through Machine Learning
NewsEye International Conference - Virtual
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
Digital History Seminar Series, Institute of Historical Research, University of London School of Advanced Study - Virtual
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
Discovery Series, Harvard University - Virtual
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
Honors Program Speaker Series, Texas A&M University-Corpus Christi - Virtual
Sephardic Experiences of Modernity: Newspapers, Migrants and Midwives
Stroum Center for Jewish Studies Colloquium, The University of Washington - Virtual
The Digital Humanities and the Ladino Press: Using Machine Learning to Extract and Analyze Visual Content in Historic Ladino Newspapers
#DHJewish 2021 - Jewish Studies in the Digital Age Conference - Virtual
Newspaper Navigator Data Jam
Library of Congress - Virtual
Newspaper Navigator: An Introduction & Demo
Living with Machines Group, The Alan Turing Institute & British Library - Virtual
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
Drexel University - Virtual
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
Collections as Data Discussion Series, Center for Digital Humanities & Princeton University Library - Virtual
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
Computer Vision for Digital Heritage, The Alan Turing Institute - Virtual
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
Digital History Workshop, The Johns Hopkins University - Virtual
Newspaper Navigator: Reimagining Historic Newspapers with Machine Learning
Data Dialogue Series, Duke University - Virtual
Newspaper Navigator: Reimagining Library Search and Discovery with Machine Learning
ULS Technology in University Libraries Committee Tech Forum - Virtual
Seeing Editors: Metadata, Machine Learning, and the Shapes of Social Justice
National Endowment for the Humanities & Library of Congress - Virtual
Teaching Computers to Read Ladino
Stroum Center for Jewish Studies, The University of Washington - Virtual
Data Visualization
DIRAC Institute Lunch Seminar, University of Washington - Seattle, WA
Mapping the University of Washington’s Sephardic Studies Collection
GIS Symposium, University of Washington - Seattle, WA
Needles in a Digital Haystack: Improving Digital Archive Research
Digital Futures Discovery Series, Harvard University - Cambridge, MA
The International Tracing Service and Machine Learning
Machine Learning + Libraries Summit, Library of Congress - Washington, D.C.
Applying Digital Humanities Research Methods to Holocaust Studies: A Case Study of the Roman Catholic Clerical Prisoners in the Dachau Concentration Camp
Committee on Ethics, Religion, and the Holocaust, United States Holocaust Memorial Museum - Washington, D.C.
ITS and Machine Learning
The International Tracing Service Archive - Virtual
Q&A Session
UW CSE NSF Graduate Research Fellowship Program - Seattle, WA
The Clergy in Dachau and Digital Humanities
The Jack, Joseph, and Morton Mandel Center for Advanced Holocaust Studies, United States Holocaust Memorial Museum - Washington, D.C.
The International Tracing Service Archive and Machine Learning
The Jack, Joseph, and Morton Mandel Center for Advanced Holocaust Studies, United States Holocaust Memorial Museum - Washington, D.C.
The International Tracing Service Archive and Machine Learning
Data Science Seminar, Smithsonian Data Science Lab - Washington, D.C.
The International Tracing Service Archive and Machine Learning
LC Labs, Library of Congress - Washington, D.C.
Using Computer Vision and Machine Learning to Classify ITS CNI Cards
Improving Access to the ITS Digital Archive Workshop, Wiener Holocaust Library - Virtual