Research

In DataLab, students find truth in numbers

Monday, April 3, 2017

Those stories about successful head transplants and bull semen in energy drinks? Clearly bogus.

But what about the peer-reviewed journal article on DNA sequencing that uses fabricated data? How about a scientific line graph that shows zero change in global temperatures over 135 years? Or a hyped-up article published in a predatory journal? How do you detect what’s fake, break it down, analyze, evaluate, and debunk it?

Teaching UW students those sophisticated reasoning skills is a prime goal at the iSchool’s DataLab, where a team of visionary researchers is working to build a more data-literate society, starting with students. “If we can teach students to read a scientific paper and be able to critique it, that will be more important than any technical skills we impart to them,” says iSchool Assistant Professor Jevin West (pictured second from right above).

The lab, now in its fourth year, pulls together big-data researchers who create mathematical models to examine human behavior at scale. Analyzing large sets of data, they can study the connections and patterns and flow of information, whether it’s the rapid spread of misinformation in online communities or some surprising trends in scholarly citations viewed over more than a century.

The researchers’ goals are as big as their datasets: They want to smarten up the world and, in the process, make it more just. “Data for Social Good is our theme over the next several years,” says West, who co-directs the lab with colleague Emma Spiro (pictured above).  “It’s the idea that we can use data analysis and artificial intelligence and statistical methods and all these things we do on methods and apply them to real social issues.”

The lab is young, but already making waves, with stories on breakthrough research in big-name publications such as The Washington Post and Huffington Post. Theirs is a brand-new field, and it’s moving incredibly fast, says West. “Sociology, biology, the physical world we live in, it’s all messy. Disentangling that messiness is one of the things we’re after.”

Diverse faculty, students

The faculty comes to the lab from a wide range of disciplines, including data curation, computer science, applied mathematics, information science, sociology, theoretical biology, and information visualization, a field that does a double-take on those data-driven graphs and tables users too often accept without question. 

“Everyone at the DataLab has a slightly different approach to similar concerns, so we have a cross-pollination of ideas that is really unique,” says Assistant Professor Nicholas Weber (pictured at left above), who specializes in data curation. “We are all learning about each other’s work, which informs and builds everyone’s work.”

Students, undergraduate to post-doc, are also academically diverse. They come from sociology, bioengineering, neuroscience, economics, computer science, information science, and other fields. Their weekly lab sessions with professors are lively idea fests.

“All these different students, you put them in a room together and great things happen,” says Spiro, assistant professor at the iSchool and adjunct assistant professor in the Department of Sociology. “The conversations that result from multidisciplinary perspectives and feedback enrich all of our research.”

Analyzing big social data collected from Twitter, Facebook, and other social media, Spiro and collaborators have conducted leading studies examining how rumors spread during times of crisis, how social media users attempt to correct false information, and the critical role officials can play by quickly getting online and responding to stories.  The researchers are developing breakthrough algorithms that can detect rumors even as events unfold; they’re also studying how network-based interventions might contain the spread of misinformation.

Jevin West, Nicholas Weber, Emma SpiroSpiro sees similarities in the intentional spread of fake news, including the propagation of politicized conspiracy theories throughout the 2016 president election: Hillary Clinton did not suffer that alleged “mysterious illness” during the campaign nor did a bot write Donald Trump’s inauguration speech. “It is important to understand how and why such stories spread through networked populations, as well as the role of information technology in this phenomenon, if we want to support people in the task of discerning information credibility,” says Spiro.

Students at all levels have helped with collecting and curating data, and analyzing information spread, adds the researcher. “Working with curious and creative students is one of my favorite parts of the work we do here at the DataLab.”

Spiro’s colleague Jessica Hullman leads information visualization research at the lab, teaching students to be critically aware of how and to what intent those flowcharts and scatterplots and interactive timelines are created and interpreted. Just because they see it on a graph, doesn’t mean it’s true. “I am passionate about understanding how people create, make sense of, and communicate with data and visualizations,” states the iSchool assistant professor, who, with research colleagues, was this year honored with an ACM CHI (Computer-Human Interaction) best paper award.

Transparency is key

Another key to 21st-century data literacy is open data, a specialty of lab researcher Weber.  “Transparency in government is one of the pillars we think of in a democratic society,” he says. “The problem is, the general population cannot do much with the raw data that is open to them. We need to make that information more accessible and available to the people who need it most.”

Weber and DataLab colleague Carole Palmer, associate dean for research at the iSchool, launched a project that sends iSchool students to work on open-data initiatives in public agencies, including the Washington State Department of Transportation and the City of Seattle. Public institutions often collect large quantities of data without compiling and curating the information, making it hard to find and hard to use. Under Weber and Palmer’s direction, iSchool students will gather and standardize the institutions’ large datasets and put them in a central place so that they can be easily discovered and explored, tomorrow or 10 years from now.

Students will also work with the Seattle Public Library to figure out the kinds of questions people at individual branches are trying to ask city, county, and state governments, whether it’s how to get a pothole filled or why their favorite stream is closed to fishing. The goal is to create data portals to help them find their answers. “We are trying to serve the general population in a way that is meaningful,” says Weber.

Associate Professor Bill Howe (pictured in top photo at right), who joined the iSchool and the DataLab in fall, concentrates on making data work for positive change in cities. He’s leading the UW’s UrbAnalytics project and a partnership with the University of British Columbia that focuses on applying data to solve urban problems.

“In this urban space, in Seattle there’s a lot of energy around how we can use these methods and the objective, open data to help manage cities,” Howe says. The key, he says, is to use data responsibly and ethically, and to train others how to look for biases inherent in the data. “At some point it’s not about what you can do – it’s what you should do. If you let people ask arbitrary questions over arbitrary datasets, the probability is you’re going to get unreliable results coming out in a blog post that gets picked up by the press and becomes ‘fact.’”

That’s a phenomenon West and UW biology professor Carl Bergstrom aim to combat with their provocatively named new class, “Calling Bullshit in the Age of Big Data.” The cheeky syllabus for the course, which debuted spring quarter, announces: “Our world is saturated with bullshit. Learn to detect and defuse it.” Posting of the syllabus drew 100,000 users to the course website over a matter of days. “This is not the first critical reasoning class,” says West, “but our focus will be on data reasoning, which is what we do at the DataLab.”

Fertile ground for research

West is also deep into academics research, examining the vast vaults of scholarly literature that expand daily, fed by thousands of new articles. The assistant professor is helping build tools that allow scholars to find relevant articles, search knowledge areas, and understand the relative importance of research material in those masses of data.

“We have millions of papers published over the centuries. Those papers house the remedies of many of our ills in society, whether it’s health, economy, or engineering. The difficulty is in finding them. We need new tools to mine the literature at scale,” he says. “The better we can access it as scientists and researchers, the more efficient we will be at innovating.”

West finds scholarly literature, and the billions of citations that that connect it, a ripe field for analysis and discovery. “It is an incredibly interesting sandbox to play in, given its beautiful structure and the fact that it has been curated for centuries,” says West. “I consider it one of humanity’s most valuable resources. To work on it daily is pretty neat.”

But it is a sandbox increasingly littered with falsehood and fakery, too often dressed up as scholarship in fancy algorithms and visuals. “Not all academic sources and venues are equal. People not from science may see a claim on CNN, tweet and retweet it, and follow it to a predatory journal that has no reputation in science. That’s one way we’re getting fooled,” says West.

If the truth is still out there, the passionate researchers at the iSchool’s DataLab are determined to help our society find it. “Our democracy depends on an informed citizenry,” says West. “If we can provide a set of tools and methods and case studies for our students and the public, then that’s a social mission I want to be part of.”