MSIM Capstone teams wrangle messy state data

By Mary Lynn Lyke Friday, May 26, 2023

Washington State’s open data portal provides citizens an abundance of free, high-value information, from interactive maps and crime statistics to budget proposals, campaign contribution sources, health provider credentials and electric vehicle registration numbers … or is that electric vehicles registration?

That little “s” points to a messy problem. The portal, www.data.wa.gov, started as a largely decentralized, crowd-sourced, do-your-own thing collection site, with individual publishers at different agencies creating an estimated 1,200 tags to help people access government information. After more than 10 years, the keyword tags have become riddled with inconsistent singular/plurals, misspellings, duplications and term variations that can make it harder for users to find needed information.

Washington Technology Solutions (WaTech), the state IT agency that operates the portal, wanted a tag cleanup and some guiding rules for agency publishers, as well as an overall portal performance update. Managers enlisted two separate Capstone teams from the iSchool’s Master of Science in Information Management (MSIM) program to take on these challenges.

The four Early-Career residential students from Team {Range} tackled the tags, putting to work a mix of skills in data science, information architecture and user experience design. They also consulted with MSIM professors for expertise in data taxonomy. “Our professors were phenomenal. They were there for us,” says team member Isabella Eriksen.

After researching best practices for the cleanup, the team manually inspected each tag, cleared up typos and other inconsistencies, identified tags that needed to be improved and checked each for relevancy. “With just our initial analysis, we were able to reduce the tags by 40 percent,” says team member Alana Montoya.

Team {Range} created guidelines for portal publishers – including making terms plural for all categories. And, to make sure the same mess didn’t happen again, they developed a dashboard to help sponsors monitor tags going forward. The final task was creating recommendations for further refinements. “In a year or two, if the portal was left to its own devices, it would be the exact same problem,” says team member Ken Junichi Masumoto.

{Range}’s project sponsor describes them as a model team, in both quality of work and effective management. “We now have a really clear set of recommendations and a protocol for tags because of this team. It couldn’t have worked better,” says iSchool alumna Kathleen Sullivan, the open data librarian for the Washington State Library, which contracts with WaTech.

Team Tech Husky took on WaTech’s performance update challenge. “Washington’s open data collection has a lot of assets, but sponsors don’t know how it is performing or how many people are using it,” says Jia Jia Yu, one of three team members, all Early-Career residential students from Taiwan. Her role is data analyst, with expertise in visualizations.

To determine if the open data program was on track with its goals, they created data pipelines, a dashboard and visualizations for examining various performance indicators. The indicators measure such areas as the number of assets a particular agency publishes and which datasets are of the greatest interest to the public. “With the dashboard, our sponsors and the public can see what users are most curious about and what kind of data is most useful,” says Raymond Su, project manager.

The team dramatically increased productivity with their automatic data pipeline, which can quickly fetch information and keep the portal’s datasets constantly updated. “Before we started, WaTech was manually updating their datasets with Excel, which was challenging and time-consuming,” says Frank Lai, software engineer for the team.

Team Tech Husky’s project sponsor says she was excited to see the team figuring out these basic structures and automating the pipeline. “They were really smart, asked good questions, communicated well, and were self-organizing,” says Cathi Greenwood, open data program manager.

Adds Greenwood: “Both teams did this not just for us, but for the public good. The teams’ solutions will be published on GitHub (a code hosting platform), where any government using this type of open data portal will be able to use them for their own portals.”

Looking back at their hard work, the Capstone students say they enjoyed watching iSchool lessons play out and evolve in real-life settings. “I saw so many things we learned at the iSchool come to life,” says Team {Range} member Max Lieberman. “I realized that information management is not just about finding problems, but about establishing what practices should be put in place and finding ways to keep those practices sustainable long-term.”

Pictured at top, from left, are MSIM students Raymond Su, Frank Lai, Jia Jia Yu, Isabella Eriksen, Max Lieberman, Ken Masumoto and Alana Montoya.