Capstone Projects | Information School | University of Washington

Deepcare

Deepcare is working with SEIU 775 Benefits Group, Washington’s leading homecare benefits provider, to uncover key causes of turnover in the homecare industry. The annual turnover rate is approaching a staggering 60%, meaning that in Washington alone, 30,000 additional aides will be needed in the next decade. We built a data-mart and performed some data analysis in order to uncover the causes and predictors of this high turnover rate, and found out that much of what was conventionally thought to be true about this industry wasn’t. These results will let SEIU Benefits Group most efficiently teach and train workers.

Enhanced Pathogen Detection Through Metagenome Assembly Reference Optimization

Project Premonition aims to detect pathogens before they cause outbreaks by turning mosquitoes into devices that collect data from animals in the environment. First, mosquitoes are collected using robots. Then, each metagenome sample goes through DNA sequencing. Finally, we use metagenome alignment to identify what pathogens exist in each sample. For my capstone project, I optimized this metagenome alignment step by using data science to reduce redundancy and contamination in our reference genome database. This data is both large and complex, containing over 600,000 genomes and 1.4 Tbp (tera base pairs).

Faceliftr

People of various backgrounds suffer from different facial skin issues. Although there are many skincare products, skin varies and may react differently to products. FaceLiftr, an “all-in-one” platform, identifying and fixing skincare problems. By taking a photo of the user, FaceLiftr evaluates face health issues and recommends products to improve complexion and skincare. These products can be imported into a regimen builder that tracks user’s daily skincare routine. The users can share their routines with others, or use public routines. Regular checkups and new photos will also allow a user to track their progress closely, getting new recommendations over time.

From Datum to Data: A Qualitative Research Protocol for Studying Data Science in the Public Sector

This capstone project is a protocol intended for researchers conducting qualitative sociotechnical research on data science in the public sector. Its purpose is twofold. First, to design an interpretive qualitative study to further understanding of the as-of-yet understudied area of data science workers who operate in the public sector. Second, to produce a protocol that addresses persistent challenges in transparency within qualitative sociotechnical research by providing a toolkit for doing open, credible, and reproducible collection and analysis of qualitative data. As with data science research itself, this protocol borrows from multiple disciplines to allow for a high degree of usability.

Health Equity of Home Care Aides: Investigating Chronic Disease Prevalence

Home Care Aides are the future of long-term care in the USA, allowing older adults and people with disabilities to live in the community and age in place. Unfortunately, a significant portion of the HCA workforce suffers from high rates of chronic diseases. Our team has partnered with SEIU 775 Benefits Group to understand and communicate the prevalence of chronic diseases in the HCA workforce through statistical analysis and evaluation. By utilizing medical insurance claims data, our team has generated an interactive report that provides information on a vulnerable group of healthcare providers in order to support a healthy workforce.

Misusing Science: Investigating the Science Misinformation Pipeline

“Potentially Predatory Journals” (PPJs), or journals that may not adhere to proper academic peer-review, could give false credibility to questionable or pseudo-scientific papers. These potentially questionable papers can sometimes leak into the public via social networks, cherry-picking or emotionally charging results to further specific political/ideological causes. With the proliferation of “foreign information campaigns” (FICs), or state-backed disinformation operations, these social media posts are often amplified/distorted. The goal of this project is to investigate these two sources of science misinformation and lay the foundation for future work in this space.

MuscleUp: Your personal AI workout tailor

With over 100,000 workout programs on the internet, it can take months of research for the layperson to find the best program that fits their own specific needs and fitness aspirations. MuscleUp is a mobile application that analyzes user workout data to recommend the best exercise programs for users with similar personal characteristics and goals. We use AI/ML prediction models to suggest optimal programs for users based on the unique personal and exercise data they provide. MuscleUp is designed to help people achieve their fitness goals more efficiently by bypassing the search for a personalized and effective fitness regimen entirely.

Natural Language Search

PitchBook Natural Language Search (NLS) is a general search experience improvement solution with natural language understanding. Currently the PitchBook platform requires selecting from more than 50 criteria checkboxes in advanced search for precise company searches. With our advanced NLP framework, this solution enables users to use a sentence to search for companies fast and efficiently. The NLS general search dashboard fetches targeted criteria based on a single sentence user typed in, and returns identified criteria and relevant company results from database. This project aims to improve user search experience and explore more AI possibilities for PitchBook in the long term.

Performance Dashboards in Power BI

Fluke Global Materials Team developed an Azure data warehouse for improved data analysis, retention, addressing the lack of standardization between sites and data collection. The factories produce printed one-off reports and use handwritten forms to record data which are discarded at the end of the month. Our project is to create dashboards to ensure employees have access to the right information. Users can interact with visualizations, dig deeper into data using Data Analysis Expressions (DAX) and make informed business decisions from the insights.

Phone Application Behavior in Network Traffic

Network traffic classification has grown tremendously in the past few decades primarily due to affordable data storage and explosively growing computing prowess bringing machine learning to the forefront in handling optimization and capacity control. In this project, we aim to classify network traffic into different categories of mobile apps to analyze customer usage and optimize network traffic. Through iterative cycles of development, we have generated network traffic, analyzed, engineered and modeled it to obtain a final product that our client can scale and implement on real-world traffic.

iSchool Capstone

2019