Specializations
- Computing Cultural Heritage
- Collections as Data
- Digital Humanities
Research Areas
Biography
Benjamin Charles Germain Lee is an incoming Assistant Professor in the Information School at the University of Washington (starting Autumn 2024), where he is starting the Lab for Computing Cultural Heritage. His research reimagines how we search and interpret cultural heritage collections using machine learning and computation. This research has three central goals: (1) developing large-scale search and discovery systems for digitized and born-digital collections; (2) leveraging these systems in order to advance research in the digital humanities and cultural heritage; (3) studying the ethical and sociotechnical implications of applying machine learning in this context.
Lee is currently a Kluge Fellow in Digital Studies at the Library of Congress, where he is working with the Library of Congress’s Web Archiving Team. He received his Ph.D. in Computer Science & Engineering from the University of Washington, supported by a National Science Foundation graduate research fellowship in machine learning. During his Ph.D., he developed Newspaper Navigator, a machine learning supported system that enables searching for text and images in 16.3 million historical newspaper articles. He has served as an Innovator in Residence at the Library of Congress, the inaugural Digital Humanities Associate Fellow at the United States Holocaust Memorial Museum, a Visiting Fellow in Harvard’s History Department, and a Richard and Ina Willner Memorial Fellow in the Stroum Center for Jewish Studies at the University of Washington.
Lee is a General Editor at Digital Humanities Quarterly. His public writing has appeared in WIRED, Gawker, Current Affairs, Jacobin, Real Life, GoldFlakePaint, Protean, and Bright Wall/Dark Room.
He is currently recruiting Ph.D. students for Autumn, 2024.
Education
- Ph D, Computer Science & Engineering, University of Washington, 2023
- MS, Computer Science & Engineering, University of Washington, 2020
- BA, Astrophysics and Mathematics, Harvard College, 2017
Awards
- Best Digital Humanities Dataset - 2020 DH Awards, 2021
- Best Resource Paper Runner-up - CIKM, 2020
- summa cum laude - Harvard College, 2017
- Thomas T. Hoopes Prize, 2017
- John Harvard Scholar, 2015-2016
- Phi Beta Kappa, 2016
- Herchel Smith Harvard Undergraduate Science Research Fellow, 2015
- Philip Hofer Prize for Collecting Books or Art - Harvard University, 2014
- National Merit Scholarship, 2013
Publications and Contributions
-
Journal Article, Professional JournalLIMEADE: From AI Explanations to Advice Taking (2023)ACM Transactions on Interactive Intelligent Systems, Special Issue: “Human-Centered Explainable AI”, 13(4)
-
Journal Article, Academic JournalThe “Collections as ML Data” checklist for machine learning and cultural heritage (2023)Journal of the Association for Information Science and Technology
-
Journal Article, Professional JournalGrappling with the Scale of Born-Digital Government Publications: Toward Pipelines for Processing and Searching Millions of PDFs (2022)International Journal of Digital Humanities
-
Magazine/Trade PublicationManufacturing Nostalgia (2022)Current Affairs,
-
Book, Scholarly-NewThe Digital Humanities and the Ladino Press: Using Machine Learning to Extract and Analyze Visual Content in Historic Ladino Newspapers (2022)Jewish Studies in the Digital Age
-
Journal Article, Academic JournalTowards an Experimental Bibliography of Hemispheric Reconstruction Newspapers (2022)Criticism, 64(3, Article 15)
-
Commissioned ReportsA Landscape of Data Sources: Findings & Recommendations, A Report Commissioned by the Library of Congress (2021)Library of Congress Contract LCLBN20E0018
-
Journal Article, Professional JournalCompounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset (2021)Digital Humanities Quarterly, 15(4)
-
Conference PaperLayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis (2021)ICDAR 2021, pp. 131–146
-
Journal Article, Professional JournalMachine Learning and the Social Studies (2021)Social Education, 85(2)
-
Conference PaperNavigating the Mise-en-Page: Interpretive Machine Learning Approaches to the Visual Layouts of Multi-Ethnic Periodicals (2021)Computational Humanities Research (CHR) 2021, pp. 61
-
Magazine/Trade PublicationPutting the ‘Capitalism’ in ‘Surveillance Capitalism’ (2021)Current Affairs,
-
Magazine/Trade PublicationSpeaking for the Past (2021)Real Life
-
Journal Article, Public or Trade JournalAnne Frank’s Ghost All Around (2020)GoldFlakePaint
-
DemoNewspaper Navigator: Open Faceted Search for 1.5 Million Images (2020)Adjunct Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pp. 120–122
-
EuropeanaTech PublicationNewspaper Navigator: Putting Machine Learning in the Hands of Library Users (2020)EuropeanaTech Insight,
-
Conference Paper
-
Magazine/Trade PublicationThe Singularity Prophets (2020)Current Affairs,
-
Journal Article, Professional JournalMachine Learning, Template Matching, and the International Tracing Service Digital Archive: Automating the Retrieval of Death Certificate Reference Cards from 40 Million Document Scans (2019)Digital Scholarship in the Humanities, 34(3)
-
Journal Article, Professional JournalImproved Point-source Detection in Crowded Fields Using Probabilistic Cataloging (2017)The Astronomical Journal, 154(4)
-
Workshop PaperLine Detection in Binary Document Scans: A Case Study with the International Tracing Service Archives (2017)2017 IEEE International Conference on Big Data, pp. 2256-2261
-
Journal Article, Professional JournalGalaxy Redshifts from Discrete Optimization of Correlation Functions (2016)The Astronomical Journal, 152(6)
Presentations
-
#WhyWebArchiving: Preserving Internet Content for Research Use
(2023)
Library of Congress & International Internet Preservation Consortium - Virtual
-
Computing on Cultural Heritage: Reports from an LC Labs Experiment
(2023)
American Historical Association 2023 - Philadelphia, PA
-
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
(2023)
Digital History Research Colloquium - Humboldt-Universität Berlin (Virtual)
-
#WhyWebArchiving: Preserving Internet Content for Research Use
(2022)
Strategic Visioning Workshop for Digital Strategy at the Library of Congress - Virtual
-
A Computational Periodicals Unconference: Exploring New Opportunities for Critical and Collaborative Inquiry
(2022)
DH Unbound 2022 - Virtual
-
Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset
(2022)
Digital Humanities + iSchool (DHIS) Collective Meeting, University of Illinois Urbana-Champaign - Virtual
-
Compounded Mediation: Excavating the Newspaper Navigator Dataset
(2022)
Fifth Annual GHI Conference on Digital Humanities and Digital History, German Historical Institute Washington - Virtual
-
How to Get Published in Academic Journals
(2022)
Critical Digital Methods Institute CDMI Workshop Series, University of Toronto - Virtual
-
Interaction
(2022)
CCC Artificial Intelligence Roadmap Workshop 2 - Denver, CO
-
New Directions for Interdisciplinary Collaborations in Periodical Studies
(2022)
Research Society for American Periodicals - Virtual
-
Newspaper Navigator: Hosting the Dataset and Deploying the Search Application
(2022)
Designing Storage Architectures for Digital Collections 2022, Library of Congress - Virtual
-
Newspaper Navigator: Open Faceted Search for 1.5 Million Images
(2022)
Computer Science & Engineering HCI Seminar, The University of Washington - Virtual
-
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
(2022)
Fantastic Futures 2021 - Virtual
-
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
(2022)
Digital Humanities 2022 - Virtual
-
Novel Machine Learning Methods for Computing Cultural Heritage: An Interdisciplinary Approach
(2022)
Center for Digital Humanities, Princeton University - Virtual
-
Serving Researchers With Public Web Archive Datasets in the Cloud
(2022)
IIPC Web Archiving Conference - Virtual
-
The Digital Humanities and the Ladino Press: Unlocking Historic Ladino Newspapers with Machine Learning
(2022)
ucLADINO Conference 2022 - Virtual
-
Using Machine Learning to Extract and Analyze Advertisements in Historic Ladino Newspapers, 1890-1948
(2022)
Studying Advertisements in pre-1939 Jewish Press: Methods and Challenges Workshop, University of Wrocław - Virtual
-
Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset
(2021)
Digital History of Science Working Group, Consortium for the History of Science, Technology, and Medicine - Virtual
-
Cowboys, Computers, and Cartoons: Excavating and Explicating America’s Political Cartoons
(2021)
Association for Documentary Editing Annual Meeting - Virtual
-
Data & Technologies
(2021)
Collective Wisdom Workshop - Virtual
-
From Chronicling America to Newspaper Navigator: Improving Access to Historic Newspaper Photos at the Library of Congress through Machine Learning
(2021)
NewsEye International Conference - Virtual
-
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
(2021)
Honors Program Speaker Series, Texas A&M University-Corpus Christi - Virtual
-
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
(2021)
Digital History Seminar Series, Institute of Historical Research, University of London School of Advanced Study - Virtual
-
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
(2021)
Discovery Series, Harvard University - Virtual
-
Sephardic Experiences of Modernity: Newspapers, Migrants and Midwives
(2021)
Stroum Center for Jewish Studies Colloquium, The University of Washington - Virtual
-
The Digital Humanities and the Ladino Press: Using Machine Learning to Extract and Analyze Visual Content in Historic Ladino Newspapers
(2021)
#DHJewish 2021 - Jewish Studies in the Digital Age Conference - Virtual
-
Newspaper Navigator Data Jam
(2020)
Library of Congress - Virtual
-
Newspaper Navigator: An Introduction & Demo
(2020)
Living with Machines Group, The Alan Turing Institute & British Library - Virtual
-
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
(2020)
Drexel University - Virtual
-
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
(2020)
Collections as Data Discussion Series, Center for Digital Humanities & Princeton University Library - Virtual
-
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
(2020)
Computer Vision for Digital Heritage, The Alan Turing Institute - Virtual
-
Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning
(2020)
Digital History Workshop, The Johns Hopkins University - Virtual
-
Newspaper Navigator: Reimagining Historic Newspapers with Machine Learning
(2020)
Data Dialogue Series, Duke University - Virtual
-
Newspaper Navigator: Reimagining Library Search and Discovery with Machine Learning
(2020)
ULS Technology in University Libraries Committee Tech Forum - Virtual
-
Seeing Editors: Metadata, Machine Learning, and the Shapes of Social Justice
(2020)
National Endowment for the Humanities & Library of Congress - Virtual
-
Teaching Computers to Read Ladino
(2020)
Stroum Center for Jewish Studies, The University of Washington - Virtual
-
Data Visualization
(2019)
DIRAC Institute Lunch Seminar, University of Washington - Seattle, WA
-
Mapping the University of Washington’s Sephardic Studies Collection
(2019)
GIS Symposium, University of Washington - Seattle, WA
-
Needles in a Digital Haystack: Improving Digital Archive Research
(2019)
Digital Futures Discovery Series, Harvard University - Cambridge, MA
-
The International Tracing Service and Machine Learning
(2019)
Machine Learning + Libraries Summit, Library of Congress - Washington, D.C.
-
Applying Digital Humanities Research Methods to Holocaust Studies: A Case Study of the Roman Catholic Clerical Prisoners in the Dachau Concentration Camp
(2018)
Committee on Ethics, Religion, and the Holocaust, United States Holocaust Memorial Museum - Washington, D.C.
-
ITS and Machine Learning
(2018)
The International Tracing Service Archive - Virtual
-
Q&A Session
(2018)
UW CSE NSF Graduate Research Fellowship Program - Seattle, WA
-
The Clergy in Dachau and Digital Humanities
(2018)
The Jack, Joseph, and Morton Mandel Center for Advanced Holocaust Studies, United States Holocaust Memorial Museum - Washington, D.C.
-
The International Tracing Service Archive and Machine Learning
(2018)
Data Science Seminar, Smithsonian Data Science Lab - Washington, D.C.
-
The International Tracing Service Archive and Machine Learning
(2018)
LC Labs, Library of Congress - Washington, D.C.
-
The International Tracing Service Archive and Machine Learning
(2018)
The Jack, Joseph, and Morton Mandel Center for Advanced Holocaust Studies, United States Holocaust Memorial Museum - Washington, D.C.
-
Using Computer Vision and Machine Learning to Classify ITS CNI Cards
(2017)
Improving Access to the ITS Digital Archive Workshop, Wiener Holocaust Library - Virtual