Capstone Projects | Information School | University of Washington

Voices: Data from an Expansive Christian Community

Voices: Data from an Expansive Christian Community is a structured repository of people, published media, and affiliate organizations that expands beyond the hegemonic voices of white American Evangelicalism and represents a more diverse, ecumenical, and global vision of Christianity. It provides a metadata schema and selection criteria for a curated dataset intended to support further research and analysis. The structure aims to elevate diverse content creators across disciplines who engage in smart and significant conversation about Christianity and its intersections with other faiths, race, social justice, gender, sexuality, and environmentalism.

Analyzing Drug Administration in Africa

The Gates Foundation partnered with ESPEN to create an open access data portal for the purpose of empowering Ministries of Health and other partners with the information necessary to drive efficient investments for neglected tropical disease treatment. This project aims to address low utilization of the ESPEN portal through the development of new metrics and dashboards that health officials can derive actionable insights from. By leveraging publicly available data from ESPEN and consolidating many relevant data points, we have provided tools for health officials to more effectively identify and act upon points of concern in drug procurement and administration.

Behance Recommendation Engine

Behance is a social media platform owned by Adobe that allows artists to develop and share their artwork with other users. With more than 10 million users searching for inspiration, personalized recommendations play a pivotal role in the overall UX. The current recommendation algorithm at Behance requires substantial user activity to generate recommendations and surfaces artwork with higher likes more often. To make these recommendations more fine-tuned, we developed an algorithm using Cypher, that analyzes the activity of all related users & provides weight to project themes instead of likes. Our algorithm also reduced the recommendation generation time by 50%.

Collections as Data: Building a Framework for George Mason University's Special Collections

“Collections as data” (CaD) goes beyond traditional archival practices to analyze cultural heritage collections that support computationally-driven research. We analyzed George Mason University’s (GMU) Special Collections Resource Center’s (SCRC) procedures and metadata, drafted a report for the SCRC, and presented our findings. The team liaised between the SCRC and GMU’s Digital Scholarship Center (DiSC), a stakeholder in CaD initiatives and digital scholarship. This project modified SCRC’s workflows, procedures, and standards, improving accessibility to data-driven digital scholarship. The emerging strategic partnership between the SCRC and DiSC will provide researchers with new opportunities to interact with special collections materials.

Digital Preservation at the Seattle Asian Art Museum: Creating the John Grimes Travel Slides of Japan Collection

This project contributes to the Seattle Art Museum (SAM)’s Historical Media Collection, preserving materials stored in outdated formats. Due to limited resources, the Seattle Asian Art Museum (SAAM) lacks a digital preservation plan. A donation of 1,000+ slides, taken to document John Grimes’ cultural tour around Japan from 1987-88 and offering a rare glimpse at the ceremonies, architecture, and exhibits he observed, has remained unprocessed and inaccessible. To preserve and provide access to them, this project involved arranging and describing all 1,000+ slides, digitizing 200+ slides from over 20 geographical locations, and creating an Omeka exhibit for SAM’s Digital Collections.

GreenDubs: A Citizen Science Platform

GreenDubs is a web platform that aims to enhance data infrastructure for the citizen science community. GreenDubs is targeted at Science Volunteers & Project Managers who will use the platform for data aggregation and analysis. Our solution is an open-source web application designed specifically for citizen science data collection, sharing, and collaboration. Key features include integrated image classification deep learning model which saves time and effort in collecting and tagging images by reducing the manual effort. The platform also includes email integration which should save time and effort in collating data from different volunteers.

Improving Trust & Interoperability: Metadata for Data Refuge's Open Data Catalog

The Data Refuge Data Catalog archives federal climate and environmental data. It provides historical snapshots of datasets released on government data portals, which are vulnerable to deletion. Yet the metadata associated with these Data Refuge records have been minimal, and their relationship with the source records have not been clearly defined. To address this, we investigated crosswalking solutions, improved metadata of target datasets, customized an extensible schema, standardized tagging with controlled vocabulary, and documented workflow for future-phase implementation. The results improve trustworthiness and interoperability, facilitate more seamless data discovery and retrieval, and meet the needs of both archivists and researchers.

Nantucket Biodiversity Digital Repository

Since 2005, Nantucket Biodiversity Initiative has sponsored over 70 different research projects, but the reports and datasets from these projects are not easily available. We have designed a workflow to curate, label, and upload files to a searchable digital repository, and have built a documentation website to house the workflow and process documents. This supports NBI in becoming an open science leader among small science nonprofits, streamlines NBI grant reporting, ensures that researcher reports and data can be cited, and opens the possibility of research funded by NBI contributing to larger scientific studies and new knowledge creation.

Open Data Wagon: Opening Up Mobile Services Data in Public Libraries

Though public libraries have shared internal data, such as circulation data, publicly via open data portals, bookmobile data sharing has been limited. Sponsored by the Washington State Library, and using data from North Central Regional Library as a pilot, the Open Data Wagon project researched, collected, and published library bookmobile data openly on data.wa.gov, along with a reusable dataset template. Because bookmobile operations can be expensive, this project aimed to encourage information sharing among libraries, expand funding opportunities for mobile services by supplying additional data, and heighten the value of library mobile services. More info: https://opendatawagon.github.io

Promoting Information Literacy Through Indexing

This project involved cataloging a 10,000-document collection of newly digitized, multilingual Law Library of Congress treaties and international agreements. The collection was classified by metadata elements in a master spreadsheet. To ensure users’ ability to fulfill collocating and finding bibliographic objectives, LCSH and indexer-derived keywords were appended to each record. Inconsistencies between records were resolved using OpenRefine and blog posts containing trend graphs and search tips were constructed to promote the collection. When published online, this content will encourage information literacy by helping the organization’s increasingly networked stakeholders navigate and interact with this content for the first time.

iSchool Capstone

2021