Specializations

  • Computing Cultural Heritage
  • Collections as Data
  • Digital Humanities

Biography

Benjamin Charles Germain Lee is an incoming Assistant Professor in the Information School at the University of Washington (starting Autumn 2024), where he is starting the Lab for Computing Cultural Heritage. His research reimagines how we search and interpret cultural heritage collections using machine learning and computation. This research has three central goals: (1) developing large-scale search and discovery systems for digitized and born-digital collections; (2) leveraging these systems in order to advance research in the digital humanities and cultural heritage; (3) studying the ethical and sociotechnical implications of applying machine learning in this context.

Lee is currently a Kluge Fellow in Digital Studies at the Library of Congress, where he is working with the Library of Congress’s Web Archiving Team. He received his Ph.D. in Computer Science & Engineering from the University of Washington, supported by a National Science Foundation graduate research fellowship in machine learning. During his Ph.D., he developed Newspaper Navigator, a machine learning supported system that enables searching for text and images in 16.3 million historical newspaper articles. He has served as an Innovator in Residence at the Library of Congress, the inaugural Digital Humanities Associate Fellow at the United States Holocaust Memorial Museum, a Visiting Fellow in Harvard’s History Department, and a Richard and Ina Willner Memorial Fellow in the Stroum Center for Jewish Studies at the University of Washington.

Lee is a General Editor at Digital Humanities Quarterly. His public writing has appeared in WIRED, Gawker, Current Affairs, Jacobin, Real Life, GoldFlakePaint, Protean, and Bright Wall/Dark Room.

He is currently recruiting Ph.D. students for Autumn, 2024.

Education

  • Ph D, Computer Science & Engineering, University of Washington, 2023
  • MS, Computer Science & Engineering, University of Washington, 2020
  • BA, Astrophysics and Mathematics, Harvard College, 2017

Awards

  • Best Digital Humanities Dataset - 2020 DH Awards, 2021
  • Best Resource Paper Runner-up - CIKM, 2020
  • summa cum laude - Harvard College, 2017
  • Thomas T. Hoopes Prize, 2017
  • John Harvard Scholar, 2015-2016
  • Phi Beta Kappa, 2016
  • Herchel Smith Harvard Undergraduate Science Research Fellow, 2015
  • Philip Hofer Prize for Collecting Books or Art - Harvard University, 2014
  • National Merit Scholarship, 2013

Publications and Contributions

  • Journal Article, Academic Journal
    The “Collections as ML Data” checklist for machine learning and cultural heritage (2023)
    Journal of the Association for Information Science and Technology Author: Benjamin Lee
  • Journal Article, Professional Journal
    Grappling with the Scale of Born-Digital Government Publications: Toward Pipelines for Processing and Searching Millions of PDFs (2022)
    International Journal of Digital Humanities, 3(Unknown Issue) Authors: Benjamin Lee, Trevor Owens
  • Magazine/Trade Publication
    Manufacturing Nostalgia (2022)
    Current Affairs, Author: Benjamin Lee
  • Book, Scholarly-New
    The Digital Humanities and the Ladino Press: Using Machine Learning to Extract and Analyze Visual Content in Historic Ladino Newspapers (2022)
    Jewish Studies in the Digital Age Author: Benjamin C.G. Lee
  • Journal Article, Academic Journal
    Towards an Experimental Bibliography of Hemispheric Reconstruction Newspapers (2022)
    Criticism, 64(3, Article 15) Authors: Joshua Ortiz Baco, Benjamin Lee, Jim Casey, Sarah H. Salter
  • Commissioned Reports
    A Landscape of Data Sources: Findings & Recommendations, A Report Commissioned by the Library of Congress (2021)
    Library of Congress Contract LCLBN20E0018 Author: Benjamin C.G. Lee
  • Journal Article, Professional Journal
    Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset (2021)
    Digital Humanities Quarterly, 15(4) Author: Benjamin C.G. Lee
  • Conference Paper
    LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis (2021)
    pp. 131–146 Authors: Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Lee, Jacob Carlson, Weining Li
  • Journal Article, Professional Journal
    Machine Learning and the Social Studies (2021)
    Social Education, 85(2) Authors: Benjamin Lee, Ilene R. Berson, Michael J. Berson
  • Conference Paper
    Navigating the Mise-en-Page: Interpretive Machine Learning Approaches to the Visual Layouts of Multi-Ethnic Periodicals (2021)
    pp. 61 Authors: Benjamin Lee, Joshua Ortiz Baco, Sarah Salter, Jim Casey
  • Magazine/Trade Publication
    Putting the ‘Capitalism’ in ‘Surveillance Capitalism’ (2021)
    Current Affairs, Author: Benjamin Lee
  • Magazine/Trade Publication
    Speaking for the Past (2021)
    Real Life Author: Benjamin Lee
  • Journal Article, Public or Trade Journal
    Anne Frank’s Ghost All Around (2020)
    GoldFlakePaint Author: Benjamin Lee
  • Demo
    Newspaper Navigator: Open Faceted Search for 1.5 Million Images (2020)
    Adjunct Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pp. 120–122 Authors: Benjamin C.G. Lee, Daniel S. Weld
  • EuropeanaTech Publication
    Newspaper Navigator: Putting Machine Learning in the Hands of Library Users (2020)
    EuropeanaTech Insight, Authors: Benjamin Lee, Jaime Mears, Eileen Jakeway, Meghan Ferriter, Abigail Potter
  • Conference Paper
    The Newspaper Navigator Dataset: Extracting Headlines and Visual Content from 16 Million Historic Newspaper Pages in Chronicling America (2020)
    Authors: Benjamin Lee, Jaime Mears, Eileen Jakeway, Meghan Ferriter, Chris Adams, Nathan Yarasavage, Deborah Thomas, Kate Zwaard, Daniel S. Weld
  • Magazine/Trade Publication
    The Singularity Prophets (2020)
    Current Affairs, Author: Benjamin Lee
  • Journal Article, Professional Journal
    Machine Learning, Template Matching, and the International Tracing Service Digital Archive: Automating the Retrieval of Death Certificate Reference Cards from 40 Million Document Scans (2019)
    Digital Scholarship in the Humanities, 34(3) Author: Benjamin C.G. Lee
  • Journal Article, Professional Journal
    Improved Point-source Detection in Crowded Fields Using Probabilistic Cataloging (2017)
    The Astronomical Journal, 154(4) Authors: Stephen K.N. Portillo, Benjamin Lee, Tansu Daylan, Douglas Finkbeiner
  • Workshop Paper
    Line Detection in Binary Document Scans: A Case Study with the International Tracing Service Archives (2017)
    2017 IEEE International Conference on Big Data, pp. 2256-2261 Author: Benjamin C.G. Lee
  • Journal Article, Professional Journal
    Galaxy Redshifts from Discrete Optimization of Correlation Functions (2016)
    The Astronomical Journal, 152(6) Authors: Benjamin C.G. Lee, Tamás Budavári, Amitabh Basu, Mubdi Rahman

Presentations

  • #WhyWebArchiving: Preserving Internet Content for Research Use (2023)
    Library of Congress & International Internet Preservation Consortium - Virtual
  • Computing on Cultural Heritage: Reports from an LC Labs Experiment (2023)
    American Historical Association 2023 - Philadelphia, PA
  • Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning (2023)
    Digital History Research Colloquium - Humboldt-Universität Berlin (Virtual)
  • #WhyWebArchiving: Preserving Internet Content for Research Use (2022)
    Strategic Visioning Workshop for Digital Strategy at the Library of Congress - Virtual
  • A Computational Periodicals Unconference: Exploring New Opportunities for Critical and Collaborative Inquiry (2022)
    DH Unbound 2022 - Virtual
  • Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset (2022)
    Digital Humanities + iSchool (DHIS) Collective Meeting, University of Illinois Urbana-Champaign - Virtual
  • Compounded Mediation: Excavating the Newspaper Navigator Dataset (2022)
    Fifth Annual GHI Conference on Digital Humanities and Digital History, German Historical Institute Washington - Virtual
  • How to Get Published in Academic Journals (2022)
    Critical Digital Methods Institute CDMI Workshop Series, University of Toronto - Virtual
  • Interaction (2022)
    CCC Artificial Intelligence Roadmap Workshop 2 - Denver, CO
  • New Directions for Interdisciplinary Collaborations in Periodical Studies (2022)
    Research Society for American Periodicals - Virtual
  • Newspaper Navigator: Hosting the Dataset and Deploying the Search Application (2022)
    Designing Storage Architectures for Digital Collections 2022, Library of Congress - Virtual
  • Newspaper Navigator: Open Faceted Search for 1.5 Million Images (2022)
    Computer Science & Engineering HCI Seminar, The University of Washington - Virtual
  • Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning (2022)
    Fantastic Futures 2021 - Virtual
  • Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning (2022)
    Digital Humanities 2022 - Virtual
  • Novel Machine Learning Methods for Computing Cultural Heritage: An Interdisciplinary Approach (2022)
    Center for Digital Humanities, Princeton University - Virtual
  • Serving Researchers With Public Web Archive Datasets in the Cloud (2022)
    IIPC Web Archiving Conference - Virtual
  • The Digital Humanities and the Ladino Press: Unlocking Historic Ladino Newspapers with Machine Learning (2022)
    ucLADINO Conference 2022 - Virtual
  • Using Machine Learning to Extract and Analyze Advertisements in Historic Ladino Newspapers, 1890-1948 (2022)
    Studying Advertisements in pre-1939 Jewish Press: Methods and Challenges Workshop, University of Wrocław - Virtual
  • Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset (2021)
    Digital History of Science Working Group, Consortium for the History of Science, Technology, and Medicine - Virtual
  • Cowboys, Computers, and Cartoons: Excavating and Explicating America’s Political Cartoons (2021)
    Association for Documentary Editing Annual Meeting - Virtual
  • Data & Technologies (2021)
    Collective Wisdom Workshop - Virtual
  • From Chronicling America to Newspaper Navigator: Improving Access to Historic Newspaper Photos at the Library of Congress through Machine Learning (2021)
    NewsEye International Conference - Virtual
  • Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning (2021)
    Discovery Series, Harvard University - Virtual
  • Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning (2021)
    Honors Program Speaker Series, Texas A&M University-Corpus Christi - Virtual
  • Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning (2021)
    Digital History Seminar Series, Institute of Historical Research, University of London School of Advanced Study - Virtual
  • Sephardic Experiences of Modernity: Newspapers, Migrants and Midwives (2021)
    Stroum Center for Jewish Studies Colloquium, The University of Washington - Virtual
  • The Digital Humanities and the Ladino Press: Using Machine Learning to Extract and Analyze Visual Content in Historic Ladino Newspapers (2021)
    #DHJewish 2021 - Jewish Studies in the Digital Age Conference - Virtual
  • Newspaper Navigator: An Introduction & Demo (2020)
    Living with Machines Group, The Alan Turing Institute & British Library - Virtual
  • Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning (2020)
    Collections as Data Discussion Series, Center for Digital Humanities & Princeton University Library - Virtual
  • Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning (2020)
    Computer Vision for Digital Heritage, The Alan Turing Institute - Virtual
  • Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning (2020)
    Digital History Workshop, The Johns Hopkins University - Virtual
  • Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning (2020)
    Drexel University - Virtual
  • Newspaper Navigator: Reimagining Historic Newspapers with Machine Learning (2020)
    Data Dialogue Series, Duke University - Virtual
  • Newspaper Navigator: Reimagining Library Search and Discovery with Machine Learning (2020)
    ULS Technology in University Libraries Committee Tech Forum - Virtual
  • Seeing Editors: Metadata, Machine Learning, and the Shapes of Social Justice (2020)
    National Endowment for the Humanities & Library of Congress - Virtual
  • Teaching Computers to Read Ladino (2020)
    Stroum Center for Jewish Studies, The University of Washington - Virtual
  • Data Visualization (2019)
    DIRAC Institute Lunch Seminar, University of Washington - Seattle, WA
  • Mapping the University of Washington’s Sephardic Studies Collection (2019)
    GIS Symposium, University of Washington - Seattle, WA
  • Needles in a Digital Haystack: Improving Digital Archive Research (2019)
    Digital Futures Discovery Series, Harvard University - Cambridge, MA
  • The International Tracing Service and Machine Learning (2019)
    Machine Learning + Libraries Summit, Library of Congress - Washington, D.C.
  • Applying Digital Humanities Research Methods to Holocaust Studies: A Case Study of the Roman Catholic Clerical Prisoners in the Dachau Concentration Camp (2018)
    Committee on Ethics, Religion, and the Holocaust, United States Holocaust Memorial Museum - Washington, D.C.
  • ITS and Machine Learning (2018)
    The International Tracing Service Archive - Virtual
  • The Clergy in Dachau and Digital Humanities (2018)
    The Jack, Joseph, and Morton Mandel Center for Advanced Holocaust Studies, United States Holocaust Memorial Museum - Washington, D.C.
  • The International Tracing Service Archive and Machine Learning (2018)
    Data Science Seminar, Smithsonian Data Science Lab - Washington, D.C.
  • The International Tracing Service Archive and Machine Learning (2018)
    LC Labs, Library of Congress - Washington, D.C.
  • The International Tracing Service Archive and Machine Learning (2018)
    The Jack, Joseph, and Morton Mandel Center for Advanced Holocaust Studies, United States Holocaust Memorial Museum - Washington, D.C.
  • Using Computer Vision and Machine Learning to Classify ITS CNI Cards (2017)
    Improving Access to the ITS Digital Archive Workshop, Wiener Holocaust Library - Virtual
  • Newspaper Navigator Data Jam
    Library of Congress - Virtual