Specializations
- Natural language processing
- Multimodal models, including vision language models, speech, and diffusion systems
- Multilingually and multiculturally representative systems
Research Areas
Biography
Michael Saxon is a Siegel Postdoctoral Fellow with the Tech Policy Lab and the Information School at the University of Washington. His research sits on the intersection of generative model benchmarking, multimodality, and AI ethics. He is particularly interested in difficult evaluation questions that arise in multimodal systems, and in developing methods to make systems performant and authentically user-responsive across languages and cultures. Saxon earned his bachelor’s in Electrical Engineering and master’s in Computer Engineering at Arizona State University, and his Ph.D. in Computer Science at the University of California, Santa Barbara, advised by William Wang.
Education
- Ph D, Computer Science, University of California, Santa Barbara, 2025
- MS, Computer Engineering, Arizona State University, 2020
- BS, Electrical Engineering, Arizona State University, 2018
Awards
- Neal Fenzi—Resonant Founder Fellowship - University of California, Santa Barbara, 2024
- Rising Star in Generative AI - UMass Amherst Rising Stars Workshop, 2024
- Outstanding Reviewer Award - ACL 2023, 2023
- Center for Responsible Machine Learning Fellowship - University of California, Santa Barbara, 2020
- Graduate Division Central Fellowship - University of California, Santa Barbara, 2020
- National Science Foundation Graduate Research Fellowship - National Science Foundation (NSF GRFP), 2020
Publications and Contributions
-
Conference PaperCan Vision Language Models Understand Mimes? (2025)Findings of the Association for Computational Linguistics
-
Conference PaperCulture is Everywhere: A Call for Intentionally Cultural Evaluation (2025)Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2025)
-
Conference PaperDo You Know About My Nation? Investigating Multilingual Language Models’ Cultural Literacy Through Factual Knowledge (2025)Conference on Empirical Methods in Natural Language Processing (EMNLP 2025)
-
Conference PaperTC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation (2025)Findings of the Association for Computational Linguistics
-
Conference PaperThoughtTerminator: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models (2025)Second Conference on Language Modeling (COLM 2025)
-
Conference PaperVSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs (2025)Proc. International Conference on Computer Vision (ICCV 2025)
-
Journal Article, Academic JournalAutomatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies (2024)Transactions of the Association for Computational Linguistics (TACL), 12(2024), pp. 484-506
-
Conference PaperBenchmarks as Microscopes: A Call for Model Metrology (2024)Proceedings of the Conference on Language Modeling (COLM 2024)
-
Conference PaperLosing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts (2024)Proceedings of the Empirical Methods in Natural Language Processing Conference (EMNLP 2024)
-
Conference PaperLost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts (2024)Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)
-
Conference PaperWho Evaluates the Evaluations? Assessing the Faithfulness and Consistency of Text-to-Image Evaluation Metrics with T2ISCORESCORE (2024)Proceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024)
-
Conference PaperCausal Balancing for Domain Generalization (2023)Proceedings of the Eleventh International Conference on Learning Representations (ICLR 2023)
-
Conference PaperCausalDialogue: Modeling Utterance-level Causality in Conversations (2023)Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)
-
Conference Workshop PaperData Augmentation for Diverse Voice Conversion in Noisy Environments (2023)Interspeech 2023 Show and Tell
-
Conference PaperLarge Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning (2023)Proceedings of the 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023)
-
Conference PaperLet’s Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought (2023)Proceedings of the Empirical Methods in Natural Language Processing Conference (EMNLP 2023)
-
Conference PaperMultilingual Conceptual Coverage in Text-to-Image Models (2023)Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)
-
Conference PaperPECO: Examining Single Sentence Label Leakage in Natural Language Inference Datasets (2023)Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), pp. 3061-3074
-
Conference PaperWikiWhy: Answering and Explaining Cause-and-Effect Questions (2023)Proceedings of the Eleventh International Conference on Learning Representations (ICLR 2023)
-
Conference PaperNot All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis (2022)Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
-
Conference PaperSelf-Supervised Knowledge Assimilation for Expert-Layman Style Transfer (2022)Proceedings of the 36th edition of the Association for the Advancement of Artificial Intelligence Conference (AAAI 2022)
-
Conference PaperCounterfactual Maximum Likelihood Estimation for Training Deep Networks (2021)Proceedings of the Thirty-Fifth Annual Conference on Neural Information Processing Systems (NeurIPS 2021)
-
Conference PaperEnd-to-End Spoken Language Understanding for Generalized Voice Assistants (2021)Interspeech 2021, pp. 4738-4742
-
Conference PaperInvestigating Memorization of Conspiracy Theories in Text Generation (2021)Findings of the Association for Computational Linguistics (ACL 2021)
-
Conference PaperModeling Disclosive Transparency in NLP Application Descriptions (2021)Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), pp. 2023-2037
-
Journal Article, Academic JournalRobust Estimation of Hypernasality in Dysarthria (2020)IEEE Transactions on Audio, Speech, and Language Processing, pp. 2511-2522
-
Conference PaperSemantic Complexity in End-to-End Spoken Language Understanding (2020)Interspeech 2020, pp. 4273-4277
-
Conference PaperUncommonVoice: A Crowdsourced Dataset of Dysphonic Speech (2020)Interspeech 2020, pp. 2532-2536
-
Conference PaperObjective Measures of Plosive Nasalization in Hypernasal Speech (2019)Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP 2019), pp. 6520-6524
-
Conference PaperSay what? A dataset for exploring the error patterns that two ASR engines make (2019)Interspeech 2019, pp. 2528-2532
-
Conference Workshop PaperWord Pair Convolutional Model for Happy Moment Classification (2019)AAAI AffCon Workshop 2019, pp. 111-119
-
Conference Paper2D Grating Pitch Mapping of a through Silicon Via (TSV) and Solder Ball Interconnect Region Using Laser Diffraction (2016)Proceedings of the IEEE 66th Electronic Components and Technology Conference (ECTC 2016), pp. 2222-2227
Presentations
-
How to nitpick multimodal evaluations
(2025)
CVPR 2025 - Virtual
-
Multilingual multimodal evaluation: how and why
(2025)
Google Translate Research - Mountain View, CA
-
Rigorous measurement in text-to-image systems
(2024)
UMD CLIP Seminar - College Park MD
-
Rigorous measurement in text-to-image systems
(2024)
Stanford SALT Group - Palo Alto, CA
-
Rigorous measurement in text-to-image systems
(2024)
Georgetown University - Washington DC
-
Disparities in Text-to-Image Model Conceptual Knowledge Across Languages
(2023)
2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT) - Chicago, IL
-
Rigorous measurement in text-to-image systems
(2023)
USC Information Sciences Institute - Marina Del Rey, CA
-
Rigorous measurement in text-to-image systems
(2023)
Arizona State University - Tempe, AZ