iSchool Capstone

2019

Project Logo

Categorization: Generating Clusters of Related Businesses

We are working with Pitchbook, a company that provides financial database and platform for Private Equity and Venture Capital professionals. VCs today spend more than 20% of their time searching for the right opportunity to invest in. For that, they have to look at thousands of companies, more than 100 KPIs. To make this process easier and less time consuming, we made Investoscope, which applies machine learning to cluster similar companies according to different KPIs and augments Pitchbook’s abilities to provide data with intuitive and intelligent UX. Using Investoscope, users can better understand volatile markets when making crucial investment decisions
Project Logo

Data-Driven Parking in Seattle's Belltown North Neighborhood

An average driver in Seattle spends 58 hours a year looking for parking. To reduce this burden, city officials must first understand on-street parking occupancy and parking behaviors. The challenge is that pay station transaction data doesn’t reflect actual parking occupancy throughout the year. To reconcile this problem, we leveraged publicly available information assets to build statistical models to predict paid parking occupancy in Belltown North. Our machine learning model, analysis of factors related to occupancy, and documented process is moving the needle toward a citywide system of policies and driver tools that streamline the parking experience.
Project Logo

Deepcare

Deepcare is working with SEIU 775 Benefits Group, Washington’s leading homecare benefits provider, to uncover key causes of turnover in the homecare industry. The annual turnover rate is approaching a staggering 60%, meaning that in Washington alone, 30,000 additional aides will be needed in the next decade. We built a data-mart and performed some data analysis in order to uncover the causes and predictors of this high turnover rate, and found out that much of what was conventionally thought to be true about this industry wasn’t. These results will let SEIU Benefits Group most efficiently teach and train workers.
Project Logo

Enhanced Pathogen Detection Through Metagenome Assembly Reference Optimization

Project Premonition aims to detect pathogens before they cause outbreaks by turning mosquitoes into devices that collect data from animals in the environment. First, mosquitoes are collected using robots. Then, each metagenome sample goes through DNA sequencing. Finally, we use metagenome alignment to identify what pathogens exist in each sample. For my capstone project, I optimized this metagenome alignment step by using data science to reduce redundancy and contamination in our reference genome database. This data is both large and complex, containing over 600,000 genomes and 1.4 Tbp (tera base pairs).
Project Logo

Faceliftr

People of various backgrounds suffer from different facial skin issues. Although there are many skincare products, skin varies and may react differently to products. FaceLiftr, an “all-in-one” platform, identifying and fixing skincare problems. By taking a photo of the user, FaceLiftr evaluates face health issues and recommends products to improve complexion and skincare. These products can be imported into a regimen builder that tracks user’s daily skincare routine. The users can share their routines with others, or use public routines. Regular checkups and new photos will also allow a user to track their progress closely, getting new recommendations over time.
Project Logo

From Datum to Data: A Qualitative Research Protocol for Studying Data Science in the Public Sector

This capstone project is a protocol intended for researchers conducting qualitative sociotechnical research on data science in the public sector. Its purpose is twofold. First, to design an interpretive qualitative study to further understanding of the as-of-yet understudied area of data science workers who operate in the public sector. Second, to produce a protocol that addresses persistent challenges in transparency within qualitative sociotechnical research by providing a toolkit for doing open, credible, and reproducible collection and analysis of qualitative data. As with data science research itself, this protocol borrows from multiple disciplines to allow for a high degree of usability.
Project Logo

Health Equity of Home Care Aides: Investigating Chronic Disease Prevalence

Home Care Aides are the future of long-term care in the USA, allowing older adults and people with disabilities to live in the community and age in place. Unfortunately, a significant portion of the HCA workforce suffers from high rates of chronic diseases. Our team has partnered with SEIU 775 Benefits Group to understand and communicate the prevalence of chronic diseases in the HCA workforce through statistical analysis and evaluation. By utilizing medical insurance claims data, our team has generated an interactive report that provides information on a vulnerable group of healthcare providers in order to support a healthy workforce.
Project Logo

Misusing Science: Investigating the Science Misinformation Pipeline

“Potentially Predatory Journals” (PPJs), or journals that may not adhere to proper academic peer-review, could give false credibility to questionable or pseudo-scientific papers. These potentially questionable papers can sometimes leak into the public via social networks, cherry-picking or emotionally charging results to further specific political/ideological causes. With the proliferation of “foreign information campaigns” (FICs), or state-backed disinformation operations, these social media posts are often amplified/distorted. The goal of this project is to investigate these two sources of science misinformation and lay the foundation for future work in this space.
Project Logo

MuscleUp: Your personal AI workout tailor

With over 100,000 workout programs on the internet, it can take months of research for the layperson to find the best program that fits their own specific needs and fitness aspirations. MuscleUp is a mobile application that analyzes user workout data to recommend the best exercise programs for users with similar personal characteristics and goals. We use AI/ML prediction models to suggest optimal programs for users based on the unique personal and exercise data they provide. MuscleUp is designed to help people achieve their fitness goals more efficiently by bypassing the search for a personalized and effective fitness regimen entirely.
Project Logo

Natural Language Search

PitchBook Natural Language Search (NLS) is a general search experience improvement solution with natural language understanding. Currently the PitchBook platform requires selecting from more than 50 criteria checkboxes in advanced search for precise company searches. With our advanced NLP framework, this solution enables users to use a sentence to search for companies fast and efficiently. The NLS general search dashboard fetches targeted criteria based on a single sentence user typed in, and returns identified criteria and relevant company results from database. This project aims to improve user search experience and explore more AI possibilities for PitchBook in the long term.