Affiliate Positions

  • Adjunct Associate Professor, University of Washington Computer Science & Engineering
  • Adjunct Associate Professor, University of Washington Department of Electrical Engineering
  • Program Director and Faculty Chair, University of Washington Data Science Masters Degree


  • Data Management
  • Data Science for Social Good
  • Scientific Databases and Visualization

Research Area


  • INFO 330 - Databases And Data Modeling


I am an Associate Professor in the Information School, Adjunct Associate Professor in Computer Science & Engineering, and Associate Director and Senior Data Science Fellow at the UW eScience Institute. I am a co-founder of Urban@UW, and with support from the MacArthur Foundation and Microsoft, I lead UW's participation in the MetroLab Network. I created a first MOOC on Data Science through Coursera, and I led the creation of the UW Data Science Masters Degree, where I serve as its first Program Director and Faculty Chair. I serve on the Steering Committee of the Center for Statistics in the Social Sciences.

My group's research aims to make the techniques and technologies of data science dramatically more accessible, particularly at scale. Our methods are rooted in database models and languages, though we sometimes work in machine learning, visualization, HCI, and high-performance computing. We are an applied, systems-oriented group, frequently sourcing projects through collaborations in the physical, life, and social sciences.


  • Ph D, Computer Science, Portland State University, 2007
  • BS, Industrial and Systems Engineering, Georgia Institute of Technology, 1999


  • Best Paper Runner Up, Experiment, Analysis & Benchmark Track - VLDB 2023, 2023
  • Research Highlights - ACM SIGMOD Record, 2023
  • Innovation of the Month - MetroLab Network, 2021-2021

Publications and Contributions

  • Conference Paper
    Does a Fair Model Produce Fair Explanations? Relating Distributive and Procedural Fairness (2024)
    Proceedings of the 57th Hawaii International Conference on System Science, pp. 6868-6877 Authors: Yiwei Yang, William G Howe
  • Conference Paper
    Geospatial Imputation of Urban Mobility Data with Self-Supervised Learning (2024)
    Proceedings of the 57th Hawaii International Conference on System Science, pp. 5619-5628 Authors: Bin Han, William G Howe
  • Conference Poster
    Label-Efficient Group Robustness via Out-of-Distribution Concept Curation (2024)
    Conference on Computer Vision and Pattern Recognition (CVPR 2024) Authors: Yiwei Yang, Anthony Zhe Liu, Robert Wolfe, Aylin Caliskan, William G Howe
  • Conference Paper
    Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias (2023)
    Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 1174–1185 Authors: Robert Wolfe, Yiwei Yang, William G Howe, Aylin Caliskan
  • Conference Paper
    Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy (2023)
    Proceedings of the VLDB Endowment (VLDB), 16(11), pp. 3178-3191 Authors: Lucas Roseblatt, Bernease Herman, Anastasia Holovenko, Wonkwon Lee, Joshua Loftus, Elizabeth McKinnie, Taras Rumezhak, Andrii Stadnik, William G Howe, Julia Stoyanovich
  • Conference Poster
    Regularizing Model Gradients with Concepts to Improve Robustness to Spurious Correlations (2023)
    Fortieth International Conference on Machine Learning Workshop on Spurious Correlations, Invariance, and Stability (ICML SCIS 2023) Authors: Yiwei Yang, Anthony Zhe Liu, Robert Wolfe, Aylin Caliskan, William G Howe
  • Journal Article, Academic Journal
    Integrative urban AI to expand coverage, access, and equity of urban data (2022)
    The European Physical Journal Special Topics, pp. 1-12 Authors: William G Howe, J.M. Brown, B. Han, B. Herman, Nicholas Weber, A. Yan, Y. Yang
  • Conference Paper
    Ontologue: Declarative Benchmark Construction for Ontological Multi-Label Classification (2022)
    Conference on Neural Information Processing Systems (NeurIPS), pp. 14 Authors: Sean Yang, Bernease Herman, William G Howe
  • Conference Paper
    Responsible Data Management (2022)
    Communications of the ACM, 65(6), pp. 64-74 Authors: Julia Stoyanovich, Serge Abiteboul, William G Howe, H. V. Jagadish, Sebastian Schelter
  • Conference Paper
    Surj: Ontological Learning for Fast, Accurate, and Robust Hierarchical Multi-label Classification (2022)
    Companion Proceedings of the Web Conference (WWW), pp. 1106–1114 Authors: Sean Yang, William G Howe
  • Invited Paper Review
  • Invited Paper Review
    Technical perspective: Visualization search: from sketching to natural language (2022)
    Communications of the ACM, 65(7), pp. 84 Author: William G Howe
  • Journal Article, Academic Journal
    Covid-19 brings data equity challenges to the fore (2021)
    Digital Government: Research and Practice, 2(2), pp. 1-7 Authors: H. V. Jagadish, Julia Stoyanovich, William G Howe
  • Conference Paper
    EquiTensors: Learning Fair Integrations of Heterogeneous Urban Data (2021)
    International Conference on Management of Data (SIGMOD) Authors: An Yan, William G Howe
  • Conference Paper
    JECL: Joint Embedding and Cluster Learning for Image-Text Pairs (2021)
    International Conference on Pattern Recognition (ICPR) Authors: Sean T. Wang, K. H. Huang, William G Howe
  • Conference Paper
    SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra (2021)
    Proceedings of the VLDB Endowment (VLDB), 13(12), pp. 3474-3488 Authors: Y. R. Wang, S. Hutchison, J. Leang, William G Howe, D. Suciu
  • Conference Paper
    Technical Perspective: From Sketching to Natural Language: Expressive Visual Querying for Accelerating Insight (2021)
    SIGMOD Record 2021 Author: William G Howe
  • Workshop Paper
    The many facets of data equity (2021)
    EDBT/ICDT Workshops Authors: H. V. Jagadish, Julia Stoyanovich, William G Howe
  • Report
    CUAC Program Report (2020)
    Authors: Jonathan Fink, Raymond Ng, William G Howe, Emily Keller
  • Conference Paper
    Database Repair Meets Algorithmic Fairness (2020)
    ACM SIGMOD Record (2020), 49(1), pp. 34-41 Authors: Babak Salimi, William G Howe, Dan Suciu
  • Conference Paper
    Digital Government: Research and Practice (2020)
    Digital Government: Research and Practice (2020) Authors: Julia Stoyanovich, William G Howe, H. V. Jagadish
  • Conference Paper
    Fairness-Aware Demand Prediction for New Mobility (2020)
    The AAAI Conference on Articial Intelligence (AAAI) (2020) Authors: An Yan, William G Howe
  • Conference Paper
    Responsible data management (2020)
    Proceedings of the VLDB Endowment (VLDB), 13(12), pp. 3474-3488 Authors: Julia Stoyanovich, William G Howe, H. V. Jagadish
  • Conference Paper
    Beyond Open vs. Closed: Balancing Individual Privacy and Public Accountability in Data Sharing (2019)
    ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*) Authors: Margaret Young, Luke Rodriguez, Emily Keller, Feiyang Sun, Boyang Sa, Jan Whittington, William G Howe
  • Conference Paper
    Capuchin: Causal Database Repair for Algorithmic Fairness (2019)
    Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19) Authors: William G Howe, Luke Rodriguez, Babak Salimi, Dan Suciu
  • Technical Report
    Data Management for Causal Algorithmic Fairness (2019)
    pp. 12 Authors: Babak Salimi, William G Howe, Dan Suciu
  • Journal Article, Academic Journal
    Data Management for Causal Algorithmic Fairness (2019)
    IEEE Data Eng. Bull., 42(3) Authors: Babak Salimi, William G Howe, Dan Suciu
  • Conference Paper
    Database-Agnostic Workload Management (2019)
    Conference on Innovative Database Research (CIDR) Authors: Shrainik Jain, Jiaqi Yan, Thierry Cruane, William G Howe
  • Technical Report
    Delineating Knowledge Domains in the Scientific Literature Using Visual Information (2019)
    pp. 10 Authors: Sean Yang, Poshen Lee, Jevin West, William G Howe
  • Conference Paper
    FairST: Equitable Spatial and Temporal Demand Prediction for New Mobility Systems (2019)
    Proceedings of the 27th ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL), pp. 10 Authors: An Yan, William G Howe
  • Journal Article, Academic Journal
    Fairness in Practice: A Survey on Equity in Urban Mobility (2019)
    IEEE Data Eng. Bull., 42(3) Authors: An Yan, William G Howe
  • Conference Demonstration Paper
    GraviTIE: Exploratory Analysis of Large-Scale Heterogeneous Image Collections (2019)
    The World Wide Web Conference (2019), pp. 3605-3609 Authors: Sean T. Yang, Luke Rodriguez, Jevin West, William G Howe
  • Conference Paper
    Identifying the Central Figure of a Scientific Paper (2019)
    International Conference on Document Analysis and Recognition (ICDAR), pp. 1063-1070 Authors: Sean T Yang, Po-Shen Lee, Lia Kazakova, Abhishek Joshi, Bum Mook Oh, Jevin West, William G Howe
  • Technical Report
    In Defense of Synthetic Data (2019)
    pp. 3 Authors: Luke Rodriguez, William G Howe
  • Conference Paper
    Interventional Fairness: Causal Database repair for Algorithmic Fairness (2019)
    International Conference on Management of Data (SIGMOD) Authors: Babak Salimi, Luke Rodriguez, William G Howe, Dan Suciu
  • Conference Demonstration Paper
    Mithralabel: Flexible dataset nutritional labels for responsible data science (2019)
    Proceedings of the 28th ACM International Conference on Information and Knowledge Management (SIGMOD), pp. 2893–2896 Authors: Chenkai Sun, Abolfazl Asudeh, HV Jagadish, William G Howe, Julia Stoyanovich
  • Technical Report
    MultiDEC: Multi-Modal Clustering of Image-Caption Pairs (2019)
    pp. 9 Authors: Sean T. Yang, Kuan-Hao Huang, William G Howe
  • Journal Article, Academic Journal
    Nutritional Labels for Data and Models (2019)
    IEEE Data Eng. Bull., 42(3) Authors: William G Howe, Julia Stoyanovich
  • Opinion Piece
    Protect the public from bias in automated decision systems (2019)
    Seattle Times Author: William G Howe
  • Journal Article, Academic Journal
    The principles of tomorrow's university (2019)
    F1000Research, 7:1926(Unknown Issue) Authors: Daniel S. Katz, Gabrielle Allen, Lorena A. Barba, Devin R. Berg, Holly Bik, Carl Boettiger, Christine L. Borgman, C. Titus Brown, Stuart Buck, Randy Burd, Anita de Waard, Martin Paul Eve, Brian E. Granger, Josh Greenberg, Adina Howe, William G Howe, May Khanna, Timothy L. Killeen, Matthew Mayernik, Erin McKiernan, Chris Mentzel, Nirav Merchant, Kyle E. Niemeyer, Laura Noren, Sarah M. Nusser, Daniel A. Reed, Edward Seidel, MacKenzie Smith, Jeffrey R. Spies, Matt Turk, John D. Van Horn, Jay Walsh
  • Conference Workshop Paper
    Classifying digitized art type and time period (2018)
    Workshop on Data Science for Digital Art History Authors: Sean Yang, Bum Mook Oh, Daniel Merchant, William G Howe, Jevin West
  • Conference Demonstration Paper
    A Nutritional Label for Rankings (2018)
    ACM Conference on Management of Data (SIGMOD) Authors: Ke Yang, Julia Stoyanovich, Abolfazl Asudeh, William G Howe, H.V. Jagadish, Gerome Miklau
  • Conference Paper
    Delineating Disciplines Using Visual Information in Scientific Literature (2018)
    KDD 2018 BigScholar 2018: The 5th Workshop on Big Scholarly Data Authors: William G Howe, Sean Yang, Poshen Lee, Jevin West
  • Conference Paper
    EZLearn: Exploiting Organic Supervision in Large-Scale Data Annotation (2018)
    International Joint Conference on Artificial Intelligence (IJCAI) Authors: Maxim Grechkin, Hoifung Poon, William G Howe
  • Conference Paper
    Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco (2018)
    IEEE Information Visualization (InfoVis) Authors: Dominik Moritz, Chenglong Wang, Greg L. Nelson, Halden Lin, Adam M. Smith, William G Howe, Jeffrey Heer
  • Conference Workshop Paper
    MobilityMirror: Bias-Adjusted Transportation Datasets (2018)
    Workshop on Big Social Data and Urban Computing (BiDU) Authors: Luke Rodriguez, Babak Salimi, Julia Stoyanovich, William G Howe
  • Technical Report
    Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics (2018)
    pp. 14 Authors: Shrainik Jain, William G Howe
  • Conference Demonstration Paper
    DataSynthesizer: Privacy-Preserving Synthetic Datasets (2017)
    ACM Scientific and Statistical Database Management Conference (SSDBM) Authors: Haoyue Ping, Julia Stoyanovich, William G Howe
  • Conference Workshop Paper
    Deep Mapping of the Visual Literature (2017)
    Proceedings of the 26th International Conference on World Wide Web Companion (WWW): Big Scholar Workshop, pp. 1273-1277 Authors: William G Howe, Po-shen Lee, Maxim Grechkin, Sean T Yang, Jevin West
  • Conference Workshop Paper
    EZLearn: Exploiting Organic Supervision in Large-Scale Data Annotation (2017)
    Learning with Limited Labeled Data: Weak Supervision and Beyond, 2017 NIPS Conference Authors: Maxim Grechkin, Hoifung Poon, William G Howe
  • Conference Paper
    Fides: Towards a Platform for Responsible Data Science (2017)
    ACM Conference on Scientific and Statistical Database Management (SSDBM) Authors: Julia Stoyanovich, William G Howe, Serge Abiteboul, Gerome Miklau, Arnaud Sahuguet, Gerhard Weikum
  • Conference Workshop Paper
    LaraDB: A Minimalist Kernel for Linear and Relational Algebra Computation (2017)
    BeyondMR workshop, 2017 ACM SIGMOD conference Authors: Dylan Hutchison, William G Howe, Dan Suciu
  • Conference Workshop Paper
    Profiling a GPU database implementation: a holistic view of GPU resource utilization on TPC-H queries (2017)
    Thirteenth International Workshop on Data Management on New Hardware, SIGMOD Authors: Emily Furst, Mark Oskin, William G Howe
  • Journal Article, Academic Journal
    Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis (2017)
    ACM Transactions on Knowledge Discovery from Data (TKDD), 11(3) Authors: Seung-Hee Bae, Daniel Halperin, Jevin West, Martin Rosvall, William G Howe
  • Workshop Paper
    Synthetic Data for Social Good (2017)
    Bloomberg Data for Good Exchange Authors: William G Howe, Julia Stoyanovich, Haoyue Ping, Bernease Herman, Matt Gee
  • Conference Paper
    The Myria Big Data Management and Analytics System and Cloud Services (2017)
    Conference for Innovative Data Research (CIDR) Authors: Jingjing Wang, Tobin Baker, Magdalena Balazinska, Daniel Halperin, Brandon Haynes, William G Howe, Dylan Hutchison
  • Journal Article, Academic Journal
    Viziometrics: Analyzing Visual Information in the Scientific Literature (2017)
    IEEE Transactions on Big Data, PP(99) Authors: Po-shen Lee, Jevin West, William G Howe
  • Conference Poster
    Viziometrics: Identifying Central Figures in Scientific Papers (2017)
    Authors: Olga Kazakova, Po-shen Lee, Bum Mook Oh, Sean T. Yang, Jevin West, William G Howe
  • Conference Paper
    Voyager 2: Augmenting Visual Analysis with Partial View Specifications (2017)
    ACM Human Factors in Computing Systems (CHI) Authors: Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Jock, William G Howe, Jeffrey Heer
  • Journal Article, Academic Journal
    Wide-Open: Accelerating public data release by automating detection of overdue datasets (2017)
    PLOS Biology, 15(6) Authors: Maxim Grechkin, Hoifung Poon, William G Howe
  • Conference Paper
    Data Cleaning in the Wild: Reusable Curation Idioms from a Multi-Year SQL Workload (2016)
    Proceedings of the 11th International Workshop on Quality in Databases (QDB'16) Authors: Shrainik Jain, William G Howe
  • Journal Article, Academic Journal
    Deciphering Ocean Carbon in a Changing World (2016)
    Proceedings of the National Academy of Sciences, 113(12), ISBN/ISSN: ISSN 0027-8424 Authors: Mary Ann Moran, Elizabeth B Kujawinski, Aron Stubbins, Rob Fatland, Lihini I Aluwihare, Alison Buchan, Byron C Crump, Pieter C Dorrestein, Sonya T Dyhrman, Nancy J Hess, William G Howe, Krista Longnecker, Patricia M Medeiros, Jutta Niggemann, Ingrid Obernosterer, Daniel J Repeta, Jacob R Waldbauer
  • Conference Paper
    From NoSQL Accumulo to NewSQL Graphulo: Design and utility of graph algorithms inside a BigTable database (2016)
    Proceedings of the High Performance Extreme Computing Conference (HPEC 2016), pp. 1-9 Authors: Dylan Hutchison, Jeremy Kepner, Vjay Gadepally, William G Howe
  • Conference Workshop Paper
    High Variety Cloud Databases (2016)
    Proceedings of the 2016 IEEE Cloud Data Management Workshop Authors: Shrainik Jain, Dominik Moritz, William G Howe
  • Conference Paper
    MusicDB: Relational Approach for Numeric Longitudinal Music Analytics (2016)
    Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR 2016), pp. 702-708 Authors: Jeremy Hyrkas, William G Howe
  • Conference Paper
    SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment (2016)
    SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data, pp. 281-293 Authors: Shrainik Jain, Dominik Moritz, William G Howe, Ed Lazowska
  • Journal Article, Academic Journal
    Scalable clustering algorithms for continuous environmental flow cytometry (2016)
    Bioinformatics, 32(3), pp. 417–423 Authors: Jeremy Hyrkas, Sophie Clayton, Francois Ribalet, Daniel Halperin, E. Virginia Armbrust, William G Howe
  • Conference Paper
    VizioMetrix: A Platform for Analyzing the Visual Information in Big Scholarly Data (2016)
    BigScholar Workshop (Third WWW Workshop on Big Scholarly Data: Towards the Web of Scholars) Authors: Poshen Lee, Jevin West, William G Howe
  • Conference Paper
    Voyager: Exploratory analysis via faceted browsing of visualization recommendations (2016)
    IEEE Transactions on Visualization and Computer Graphics, 22(1), pp. 649–658 Authors: Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, William G Howe, Jeffrey Heer
  • Journal Article, Academic Journal
    A Demonstration of the BigDAWG Polystore System (2015)
    Proc. Very Large Database Endowment (PVLDB), 8(12) Authors: Aaron Elmore, Jennie Duggan, Michael Stonebraker, Magdalena Balazinska, Ugur Cetintemel, Vijay Gadepally, Jeffrey Heer, William G Howe, Jeremy Kepner, Tim Kraska, Samuel Madden, David Maier, Timothy Mattson, Stavros Papadopoulos, Jeff Parkhurst, Nesime Tatbul, Manasi Vartak, Stan Zdonik
  • Conference Proceeding
    Big Data Science Needs Big Data Middleware (2015)
    CIDR 2015, Seventh Biennial Conference on Innovative Data Systems (lightning talk) Author: William G Howe
  • Conference Proceeding
    Building an Urban Data Science Summer Program at the University of Washington eScience Institute (2015)
    Authors: Ariel Rokem, Brittany Fiore-Gartland, Bernease Herman, Micaela Parker, Cecilia Aragon, Bryna Hazelton, William G Howe, Valentina Staneva, Anthony Arendt, Joseph Hellerstein, Ed Lazowska, Sarah Stone, Anissa Tanweer, Jacob Vanderplas
  • Conference Proceeding
    Detecting and Dismantling Composite Visualizations in the Scientific Literature (2015)
    Pattern Recognition Applications and Methods - 4th International Conference, ICPRAM 2015, Lisbon, Portugal, January 10-12, 2015, Revised Selected Papers, pp. 247–266 Authors: Poshen Lee, William G Howe
  • Conference Proceeding
    Dismantling Composite Visualizations in the Scientific Literature (2015)
    4th International Conference on Pattern Recognition Applications and Methods (ICPRAM) Authors: Poshen Lee, William G Howe
  • Conference Proceeding
    Gaussian mixture models use-case: in-memory analysis with myria (2015)
    Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics, pp. 3 Authors: Ryan Maas, Jeremy Hyrkas, Olivia Grace Telford, Magdalena Balazinska, Andrew Connolly, William G Howe
  • Conference Proceeding
    GossipMap: a distributed community detection algorithm for billion-edgedirected graphs (2015)
    Proceedings of the International Conference for High Performance Computing,Networking, Storage and Analysis, Supercomputing 2015, Austin, TX, USA, November15-20, 2015, pp. 27:1–27:12 Authors: SeungHee Bae, William G Howe
  • Conference Proceeding
    Perfopticon: Visual query analysis for distributed databases (2015)
    Computer Graphics Forum, 34(3), pp. 71–80 Authors: Dominik Moritz, Daniel Halperin, William G Howe, Jeffrey Heer
  • Journal Article, Academic Journal
    Query-based data pricing (2015)
    Journal of the ACM (JACM), 62(5), pp. 43 Authors: Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, William G Howe, Dan Suciu
  • Journal Article, Academic Journal
    The BigDAWG Polystore System (2015)
    SIGMOD Record, 44(2), pp. 11–16 Authors: Jennie Duggan, Aaron J. Elmore, Michael Stonebraker, Magdalena Balazinska, William G Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, Stanley B. Zdonik
  • Conference Proceeding
    Time-varying clusters in large-scale flow cytometry (2015)
    IAAI Conference Authors: Jeremy Hyrkas, Daniel Halperin, William G Howe
  • Conference Proceeding
    Towards automated prediction of relationships among scientific datasets (2015)
    Proceedings of the 27th International Conference on Scientific and Statistical Database Management, SSDBM ’15, La Jolla, CA, USA, June 29 - July 1, 2015, pp. 35:1–35:5 Authors: Abdussalam Alawini, David Maier, Kristin Tufte, William G Howe, Rashmi Nandikur
  • Conference Proceeding
    Helping scientists reconnect their datasets (2014)
    Proceedings of the 26th International Conference on Scientific and Statistical Database Management, pp. 29 Authors: Abdussalam Alawini, David Maier, Kristin Tufte, William G Howe
  • Conference Proceeding
    Should we all be teaching intro to data science instead of intro to databases? (2014)
    Panel at the 2014 ACM SIGMOD international conference on management of data, pp. 917–918 Authors: William G Howe, Michael J Franklin, Juliana Freire, James Frew, Tim Kraska, Raghu Ramakrishnan
  • Journal Article, Academic Journal
    The database group at the University of Washington (2014)
    SIGMOD Record, 43(1), pp. 39–44 Authors: Magdalena Balazinska, William G Howe, Dan Suciu
  • Book, Chapter in Scholarly Book-New
    A discussion on pricing relational data (2013)
    In Search of Elegance in the Theory and Practice of Computation, pp. 167–173 Authors: Magdalena Balazinska, William G Howe, Paraschos Koutris, Dan Suciu, Prasang Upadhyaya
  • Journal Article, Academic Journal
    Collaborative science workflows in SQL (2013)
    Computing in Science and Engineering, 15(3), pp. 22–31 Authors: William G Howe, Daniel Halperin, Francois Ribalet, Sagar Chitnis, E Virginia Armbrust
  • Conference Proceeding
    Compiled Plans for In-Memory Path-Counting Queries (2013)
    Proceedings of the 1st International Workshop on In Memory Data Management and Analytics, IMDM 2013, Riva Del Garda, Italy, August 26, 2013., pp. 25–37 Authors: Brandon Myers, Jeremy Hyrkas, Daniel Halperin, William G Howe
  • Journal Article, Academic Journal
    Hadoop’s adolescence: an analysis of Hadoop usage in scientific workloads (2013)
    Proceedings of the VLDB Endowment, 6(10), pp. 853–864 Authors: Kai Ren, YongChul Kwon, Magdalena Balazinska, William G Howe
  • Journal Article, Academic Journal
    Managing Skew in Hadoop. (2013)
    IEEE Data Eng. Bull., 36(1), pp. 24–33 Authors: YongChul Kwon, Kai Ren, Magdalena Balazinska, William G Howe, Jerome Rolia
  • Conference Proceeding
    Massive scale cyber traffic analysis: a driver for graph database research (2013)
    First International Workshop on Graph Data Management Experiences and Systems, pp. 3 Authors: Cliff Joslyn, Sutanay Choudhury, David Haglin, William G Howe, Bill Nickless, Bryan Olsen
  • Conference Proceeding
    Real-time collaborative analysis with (almost) pure SQL: a case study in biogeochemical oceanography (2013)
    Proceedings of the 25th International Conference on Scientific and Statistical Database Management, pp. 28 Authors: Daniel Halperin, Francois Ribalet, Konstantin Weitz, Mak A Saito, William G Howe, E Armbrust
  • Journal Article, Academic Journal
    SQLShare: Scientific workflow via relational view sharing (2013)
    Computing in Science and Engineering, Special Issue on Science Data Management, 15(2) Authors: William G Howe, Francois Ribalet, Daniel Halperin, Sagar Chitnis, E Virginia Armbrust
  • Conference Proceeding
    Scalable Flow-Based Community Detection for Large-Scale Network Analysis (2013)
    Proceedings of IEEE International Conference on Data Mining Workshops (ICDMW 2013) Authors: Seung-Hee Bae, Daniel Halperin, Jevin West, Martin Rosvall, William G Howe
  • Conference Proceeding
    Stop That Query! The Need for Managing Data Use. (2013)
    CIDR Authors: Prasang Upadhyaya, Nick R Anderson, Magdalena Balazinska, William G Howe, Raghav Kaushik, Ravishankar Ramamurthy, Dan Suciu
  • Conference Proceeding
    The power of data use management in action (2013)
    Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1117–1120 Authors: Prasang Upadhyaya, Nick Anderson, Magdalena Balazinska, William G Howe, Raghav Kaushik, Ravi Ramamurthy, Dan Suciu
  • Conference Proceeding
    Toward practical query pricing with QueryMarket (2013)
    Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 613–624 Authors: Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, William G Howe, Dan Suciu
  • Conference Proceeding
    VizDeck: Streamlining exploratory visual analytics of scientific data (2013)
    iConference Authors: Daniel B Perry, William G Howe, Alicia MF Key, Cecilia Aragon
  • Conference Proceeding
    Hadoop’s Adolescence; A Comparative Workloads Analysis from Three Research Clusters. (2012)
    SC Companion, pp. 1452 Authors: Kai Ren, Garth Gibson, YongChul Kwon, Magdalena Balazinska, William G Howe
  • Conference Proceeding
    Optimizing large-scale semi-naive datalog evaluation in hadoop (2012)
    Proceedings of the Second International Conference on Datalog in Academia and Industry Authors: Marianne Shaw, Paraschos Koutris, William G Howe, Dan Suciu
  • Conference Proceeding
    Query-based data pricing (2012)
    Proceedings of the 31st symposium on Principles of Database Systems (PODS) Authors: Parachos Koutris, Prasang Upadhyaya, Magdalena Balazinska, William G Howe, Dan Suciu
  • Journal Article, Academic Journal
    QueryMarket demonstration: Pricing for online data markets (2012)
    Proceedings of the VLDB Endowment, 5(12), pp. 1962–1965 Authors: Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, William G Howe, Dan Suciu
  • Conference Proceeding
    Skewtune: mitigating skew in mapreduce applications (2012)
    Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 25–36 Authors: YongChul Kwon, Magdalena Balazinska, William G Howe, Jerome Rolia
  • Journal Article, Academic Journal
    The HaLoop approach to large-scale iterative data analysis (2012)
    The VLDB Journal—The International Journal on Very Large Data Bases, 21(2), pp. 169–190 Authors: Yingyi Bu, William G Howe, Magdalena Balazinska, Michael D Ernst
  • Journal Article, Academic Journal
    Virtual appliances, cloud computing, and reproducible research (2012)
    Computing in Science and Engineering, 14(4), pp. 36–41 Author: William G Howe
  • Conference Proceeding
    VizDeck: A Card Game Metaphor for Fast Visual Data Exploration (2012)
    CHI ’12 Extended Abstracts on Human Factors in Computing Systems, pp. 1667–1672, ISBN/ISSN: 978-1-4503-1016-1 Authors: William G Howe, Alicia Key, Daniel Perry, Cecilia Aragon
  • Conference Proceeding
    Vizdeck: Self-organizing dashboards for visual analytics (2012)
    Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (demo), pp. 681–684 Authors: Alicia Key, William G Howe, Daniel Perry, Cecilia Aragon
  • Journal Article, Academic Journal
    A study of skew in mapreduce applications (2011)
    Open Cirrus Summit Authors: YongChul Kwon, Magdalena Balazinska, William G Howe, Jerome Rolia
  • Journal Article, Academic Journal
    Astronomy in the cloud: using mapreduce for image co-addition (2011)
    Publications of the Astronomical Society of the Pacific, 123(901), pp. 366 Authors: Keith Wiley, Andrew Connolly, Jeff Gardner, S Krughoff, Magdalena Balazinska, William G Howe, Y Kwon, Yingyi Bu
  • Conference Proceeding
    Automatic example queries for ad hoc databases (2011)
    Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011, pp. 1319–1322 Authors: William G Howe, Garrett Cole, Nodira Khoussainova, Leilani Battle
  • Journal Article, Academic Journal
    Bioinformatics and data-intensive scientific discovery in the beginning of the 21st century (2011)
    Omics: a journal of integrative biology, 15(4), pp. 199–201 Authors: Roger Barga, William G Howe, David Beck, Stuart Bowers, William Dobyns, Winston Haynes, Roger Higdon, Chris Howard, Christian Roth, Elizabeth Stewart, others
  • Journal Article, Academic Journal
    Data markets in the cloud: An opportunity for the database community (2011)
    Proc. of the VLDB Endowment, 4(12), pp. 1482–1485 Authors: Magdalena Balazinska, William G Howe, Dan Suciu
  • Conference Proceeding
    Database-as-a-service for long-tail science (2011)
    Scientific and Statistical Database Management, pp. 480–489 Authors: William G Howe, Garret Cole, Emad Souroush, Paraschos Koutris, Alicia Key, Nodira Khoussainova, Leilani Battle
  • Conference Proceeding
    Parallel visualization on large clusters using MapReduce (2011)
    Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on, pp. 81–88 Authors: Huy T Vo, Jonathan Bronson, Brian Summa, Joao Luiz Dihl Comba, Juliana Freire, William G Howe, Valerio Pascucci, Cláudio T Silva
  • Journal Article, Academic Journal
    Astronomy in the Cloud: Using MapReduce for Image Coaddition (2010)
    CoRR, abs/1010.1015(Unknown Issue) Authors: Keith Wiley, Andrew J. Connolly, Jeffrey P. Gardner, K. Simon Krughoff, Magdalena Balazinska, William G Howe, YongChul Kwon, Yingyi Bu
  • Conference Proceeding
    Client+ cloud: evaluating seamless architectures for visual data analytics in the ocean sciences (2010)
    Scientific and Statistical Database Management, pp. 114–131 Authors: Keith Grochow, William G Howe, Mark Stoermer, Roger Barga, Ed Lazowska
  • Journal Article, Academic Journal
    HaLoop: efficient iterative data processing on large clusters (2010)
    Proceedings of the VLDB Endowment, 3(1-2), pp. 285–296 Authors: Yingyi Bu, William G Howe, Magdalena Balazinska, Michael D Ernst
  • Conference Proceeding
    SQL is dead; long live SQL: Lightweight query services for ad hoc research data (2010)
    4th Microsoft eScience Workshop Authors: William G Howe, Garret Cole
  • Conference Proceeding
    Scalable clustering algorithm for N-body simulations in a shared-nothing cluster (2010)
    Scientific and Statistical Database Management, pp. 132–150 Authors: YongChul Kwon, Dylan Nunley, Jeffrey P Gardner, Magdalena Balazinska, William G Howe, Sarah Loebman
  • Conference Proceeding
    Skew-resistant parallel processing of feature-extracting scientific user-defined functions (2010)
    Proceedings of the 1st ACM symposium on Cloud computing, pp. 75–86 Authors: YongChul Kwon, Magdalena Balazinska, William G Howe, Jerome Rolia
  • Conference Proceeding
    Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? (2009)
    Cluster Computing and Workshops, 2009. CLUSTER’09. IEEE International Conference on, pp. 1–10 Authors: Sarah Loebman, Dylan Nunley, YongChul Kwon, William G Howe, Magdalena Balazinska, Jeffrey P Gardner
  • Conference Proceeding
    Embracing Uncertainty in Large-Scale Computational Astrophysics. (2009)
    MUD, pp. 63–77 Authors: Dan Suciu, Andrew J Connolly, William G Howe
  • Conference Proceeding
    Query-driven visualization in the cloud with mapreduce (2009)
    Proceedings of the Fourth Annual Workshop on Ultrascale Visualization Authors: William G Howe, Huy Vo, Claudio Silva, J Freire
  • Conference Proceeding
    Scientific mashups: runtime-configurable data product ensembles (2009)
    Scientific and Statistical Database Management, pp. 19–36 Authors: William G Howe, Harrison Green-Fishback, David Maier
  • Conference Proceeding
    End-to-end escience: Integrating workflow, query, visualization, and provenance at an ocean observatory (2008)
    eScience, 2008. eScience’08. IEEE Fourth International Conference on, pp. 127–134 Authors: William G Howe, Peter Lawson, Renee Bellinger, Erik Anderson, Emanuele Santos, Juliana Freire, Carlos Scheidegger, António Baptista, Cláudio Silva
  • Conference Proceeding
    Quarrying dataspaces: Schemaless profiling of unfamiliar information sources (2008)
    Workshop on Information Integration Methods, Architectures, and Systems (IIMAS 08), pp. 270–277 Authors: William G Howe, David Maier, Nicolas Rayner, James Rucker
  • Journal Article, Academic Journal
    SciDB Examples from Environmental Observation and Modeling (2008)
    Center for Coastal Margin Observation and Prediction Author: William G Howe
  • Conference Proceeding
    Scientific Mashups: Runtime-Configurable Data Product Ensembles (2008)
    eScience, 2008. eScience’08. IEEE Fourth International Conference on, pp. 442–443 Authors: Harrison Green-Fishback, William G Howe
  • Journal Article, Academic Journal
    Scientific exploration in the era of ocean observatories (2008)
    Computing in Science and Engineering, 10(3), pp. 53–58 Authors: António Baptista, William G Howe, Juliana Freire, David Maier, Cláudio T Silva
  • Conference Proceeding
    Smoothing the ROI Curve for Scientific Data Management Applications. (2007)
    CIDR, pp. 185–195 Authors: William G Howe, David Maier, Laura Bright
  • Conference Proceeding
    The Ocean Appliance: Complete Platform Provisioning for Low-Cost Data Sharing (2007)
    OCEANS 2007, pp. 1–10 Authors: William G Howe, Antonio Baptista, Nicholas Hagerty, Charles Seaton, Ethan Van Matre, Paul Turner, David Maier
  • Ph.D. Thesis
    Gridfields: model-driven data transformation in the physical sciences (2006)
    Author: William G Howe
  • Conference Proceeding
    Managing the Forecast Factory (2006)
    Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on, pp. 64–64 Authors: Laura Bright, David Maier, William G Howe
  • Journal Article, Academic Journal
    Algebraic manipulation of scientific datasets (2005)
    The VLDB journal, 14(4), pp. 397–416 Authors: William G Howe, David Maier
  • Journal Article, Academic Journal
    GridFields: Model-Driven Query Services for Simulation Results in the Physical Sciences (2005)
    Author: William G Howe
  • Conference Proceeding
    Querying and Visualizing Gridded Datasets for e-Science (2005)
    Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, 5-8 April 2005, Tokyo, Japan, pp. 1106–1107 Authors: William G Howe, David Maier
  • Conference Proceeding
    Retrofitting a Data Model to Existing Environmental Data. (2005)
    SSDBM, pp. 3–13 Authors: William G Howe, David Maier
  • Book, Chapter in Scholarly Book-New
    Emergent semantics: Towards self-organizing scientific metadata (2004)
    Semantics of a Networked World. Semantics for Grid Databases, pp. 177–198 Authors: William G Howe, Kuldeep Tanna, Paul Turner, David Maier
  • Journal Article, Academic Journal
    Logical and Physical Data Independence for Native Scientific Data Repositories. (2004)
    IEEE Data Eng. Bull., 27(4), pp. 29–36 Authors: William G Howe, David Maier
  • Journal Article, Academic Journal
    A language for spatial data manipulation (2003)
    Journal of Environmental Informatics, 2(2), pp. 23–37 Authors: William G Howe, David Maier, Antonio Baptista
  • Conference Proceeding
    Modeling data product generation (2002)
    Workshop on Data Derivation and Provenance, Chicago Authors: William G Howe, Dave Maier
  • Conference Proceeding
    Representing, exploiting, and extracting metadata using metadata++ (2002)
    Proceedings of the 2002 annual national conference on Digital government research, pp. 1–7 Authors: Mathew Weaver, William G Howe, Lois Delcambre, Tim Tolle, Dave Maier


  • Applied AI in High-Expertise Settings, or Curation as Programming (2023)
    Engineering IDBE Seminar Series, NYU Abu Dhabi - Virtual
  • Data Curation as Programming (2023)
    15th Alberto Mendelzon International Workshop on Foundations of Data Management - Santiago, Chile
  • Applied AI in High-Expertise Settings, or Curation as Programming (2022)
    AI2 - Tahoma, WA (Virtual)
  • Ethical AI in the Public Sector: Towards A Semi-Synthetic Data Fabric for AI Evaluation (2022)
    Cisco Systems, Inc. - Virtual
  • Data-Centric AI: Reuse, Integration, and Synthesis of Weakly Structured Data (2021)
    Northeastern - Boston, MA
  • Equitensors: Learning Fair Integration of Urban Mobility Data (2021)
    Berkeley Institute for Transportation Studies - Berkeley, CA
  • Introspection and Interventions in Data Equity Systems (2021)
    Provenance and Visualisation Workshop - Virtual
  • Data Equity Systems (2020)
    DEEM Workshop - Virtual
  • Public interest research in data management and machine learning (2020)
    NYU - New York, NY
  • Special Session: A Technical Research Agenda in Data Ethics and Responsible Data Management (2018)
    SIGMOD - Houston, TX
  • Bias and Ethics in City Services Data Science (2017)
    Bloomberg Data for Good Exchange - New York, NY
  • Big Data + Big Sim: Query Processing over Unstructured CFD Models (2017)
    ISIM Research Workshop - Durham, UK
  • Data Analysis and Visualization Workshop (2017)
    Schloss Dagstuhl – Leibniz-Zentrum für Informatik - Dagstuhl, Germany
  • Data in the Humanities Panel (2017)
    2017 ICDE (IEEE International Conference on Data Engineering) - San Diego, CA
  • Data, Responsibly: The Next Decade of Data Science (2017)
    iSchool Founding Board - Seattle, WA
  • Deep Curation (2017)
    Tandon School of Engineering, New York University - New York, NY
  • Epistemic Issues in Data Science (2017)
    University of Massachusetts, Amherst - Amherst, Massachusetts
  • Fake Data for Social Good (2017)
    Bloomberg Data for Good Exchange - New York, NY
  • Responsible Urban Data Science (2017)
    Redondo Beach, CA
  • The Information War: Fake News, Privacy, and Big Data (2017)
    eScience Institute, University of Washington - Seattle, WA
  • The Next Decade of Data Science (2017)
    Distinguished Colloquium, University of Maryland - College Park, Maryland
  • Viziometrics: Mining the Visual Literature (2017)
    Vizualization Seminar, Scientific Computing Institute, University of Utah - Salt Lake City, UT
  • Workshop on Science and Technology for Washington State: Advising the Legislature (2017)
    Seattle, WA
  • Democratizing Data in the Cloud (2016)
    Workshop on Cloud Data Management (CloudDM) (co-located with ICDE) - Helsinki, Finland
  • Going "Deep" with Computational Data Curation (2016)
    NSF-JST Meeting on Big Data, AI, loT, and Cybersecurity for a New Society - USA
  • Responsible Data Science and Reproducibility (2016)
    Dagstuhl Seminar on “Data, Responsibly,” Schloss Dagstuhl – Leibniz Center for Informatics - Germany
  • Urban Analytics and Responsible Data Science (2016)
    SciTech Northwest - Seattle, WA
  • Data Equity Systems
    Northwest Database Society - Virtual