iSchool's Walsh applies data science to literary world

By Kayla Pohl Wednesday, September 6, 2023

Bad book sales data is like a bad credit score: it can stymie an author’s career and constrain future book contracts and publishing opportunities. Yet book sales data is proprietary — it’s purposefully hidden from the public, as well as researchers, humanities scholars and librarians.

Access to this data, along with tools to understand it, is essential in holding the mainstream publishing industry accountable for who gets book deals, which largely determines whose books are housed in libraries and available to library patrons.

In fact, according to Melanie Walsh, who has moved into a new role as an assistant professor in the University of Washington Information School, more cultural data should be open-access and available to anyone wanting to better understand culture or interrogate the practices of cultural industries. Walsh started in 2021 as an assistant teaching professor at the iSchool, and her areas of interest include data science, digital humanities, contemporary literature, libraries and cultural analytics. 

This fall, she’s teaching Introduction to Data Science, which provides an overview of key concepts, skills and technologies for students in the Master of Library and Information Science program. 

“I’m also co-teaching a new course next year about text reuse and artificial intelligence and the history of people borrowing and stealing other people’s texts, from pastiche in really old literary texts to ChatGPT,” Walsh said.

“When used in probing, creative, subversive ways, data can actually help us create a more just, equitable future, not just for literature and culture and libraries, but also more broadly.”

Walsh is committed to teaching cultural workers, such as humanities scholars and librarians, how to use computational methods to collect and understand literary and cultural data. Scholars in the humanities are rightfully suspicious of data. Data is inherently reductive, whereas art and literature foreground and celebrate the complexity of human experience. 

But, for Walsh, data isn’t just a tool to understand literature and culture — data and algorithms shape contemporary art and culture, and sometimes in harmful ways. Ignoring or resisting the influence of data strips cultural workers’ power to critique, interrogate, and hold accountable cultural industries’ use of data.

However, Walsh said, “When used in probing, creative, subversive ways, data can actually help us create a more just, equitable future, not just for literature and culture and libraries, but also more broadly.”

During her postdoctoral work at Cornell University, she developed an award-winning open-access textbook, Introduction to Cultural Analytics and Python. “It’s been used all over the world,” said David Mimno, who was Walsh’s postdoctoral advisor, “and it’s been pretty widely recognized as one of the leading source materials for the introductory-level application of computing to humanities.”

She’s also a co-editor of the Post45 Data Collective, which houses open-access peer-reviewed literary and cultural data from 1945 onward. “It seems like it’s very boring and unsexy, but it ends up being hugely important for understanding patterns,” Walsh said. 

To find those patterns, researchers need access to cultural data that hadn’t previously existed, either because it’s been purposely obfuscated or because it hasn’t been a focal point of data curators. For example, the Post-45 Data Collective hosts data about winners and judges of literary prizes worth over $10,000 from the past 100 years. “Before, this data didn’t exist in any kind of comprehensive way, and now we can see patterns in race and gender in literary prizes,” Walsh said.

Her undergraduate and graduate degrees are in English literature, and one of her research interests is in how people talk about literature on social media. When Walsh was in graduate school at Washington University in St. Louis, Michael Brown was shot and killed by police officer Darren Wilson. “I was following the Ferguson protests really closely both on the ground and on Twitter [now X], and I noticed that lots of people were tweeting James Baldwin quotations,” Walsh said. She used computational methods to figure out how quotes from Baldwin, civil rights activist and writer, were being used: How many people were tweeting Baldwin quotes? Which version of Baldwin was resonating with early Black Lives Matter activism? 

This inspired her project Tweets of a Native Son, which examines how social media users were citing Baldwin. Even though he died in 1987, he’s been a muse for the Black Lives Matter movement, and his works have been consistently revisited and reanimated on social media. 

“She's brilliant at finding intersections between these fields, digging in deep to find the most interesting and important research questions,” said Maria Anoniak, a researcher at the Allen Institute for AI who has collaborated with Walsh on a few projects.

Walsh also brings to the iSchool her commitment to an ethical approach to data handling. When she worked on a Goodreads project collecting data about crowdsourced criticism and reader reception of “classics” in the literary canon, she insisted each individual Goodreads reviewer be contacted for their consent. 

“This isn't standard practice in data science,” said Anoniak, who collaborated with Walsh on the Goodreads project. “A different researcher might not have considered the reviewers at all. But Melanie reached out to every reviewer on Goodreads and paved the way for other cultural analytics scholars to follow.”