Researchers prompt generative AI to help with firearms study

By Hallie Schwartz Wednesday, January 22, 2025

Researchers studying firearm-related incidents face significant challenges in gathering relevant data. Critical resources, such as police and hospital records, are often left incomplete, inconsistently filled out or lack necessary detail for thorough analysis. 

In a previous study, University of Washington Information School researchers used machine learning methods to scour those records. Now they are working with large language models (LLMs) to see whether they can produce better results. 

Ott Toomet
Ott Toomet

More accurate data on firearm-related incidents can play a crucial role in informing policy and prevention efforts. In a 2024 study, Assistant Teaching Professor Ott Toomet was part of a team that used machine learning methods to classify criminal cases as firearm-related incidents or not, significantly speeding up the process compared to manual review. The use of LLMs represents a new approach, and Toomet is now investigating the differences between the two methods.  

In the previous study, Toomet’s research team converted photocopies of nearly 1,500 court records from misdemeanor and felony cases into text and reviewed them to determine whether firearms were involved. By using machine learning methods to search certain keywords in each case, they identified the most efficient way to sort the data. A system that incorporated six key terms — gun, shoot, handgun, bullet, revolver and rifle — proved to be the fastest. 

Unlike machine learning, where the written code explicitly defines the desired outcome, LLMs require question prompts. LLMs, which have grown in use and popularity over the last few years, are a type of artificial intelligence that recognizes text, in the form of a prompt or a command, and generates a response. ChatGPT is a common example of an LLM, generating human-like text responses to user prompts. 

Using a Llama 3.2 3B LLM model, comparable to ChatGPT but less powerful, Toomet’s team is discovering what prompts work best in classifying the court cases. 

Jinrui Fang and Runhan Chen, both Informatics ‘23, started working on the LLM research after taking Toomet’s classes and working with him as teaching assistants. 

Firstly, Chen and Fang developed a system that uses the LLM to summarize the court records before feeding the summaries back into the model. Summarizing the cases was essential, as the originally converted court records were too lengthy to be processed effectively by the LLM. Chen explained, “Since the data is too long and not very structured, the model cannot capture the main meaning of the data. So one innovative point is to summarize the text first in concise summaries to help the model understand.” 

After summarizing the case, they asked the model, “Does this involve firearms?”

So far, the results have been comparable to those from the earlier research, though with some unexpected differences. The most notable distinction from machine learning is that LLMs move away from the precision of writing computer code. Unlike code, LLMs interpret English words to understand all their nuanced meanings. To illustrate this, Toomet explained, “My favorite example is the word ‘we’ or ‘us.’ They have two meanings. One is me and someone else. The other is me and you and someone else.” 

This makes him think that while the LLM has multiple abilities, it does not replace classical programming. 

“You look at the code and you know exactly what it does. Run it, and it does exactly what you want it to do, so everything is very well defined,” he said. “Large language models are not like that. It is more like talking to humans — ask the same question from three different people and you get three different answers.” 

Toomet and his team are still in the early stages of their research using LLMs. Despite some unforeseen challenges, the LLM is showing some potential to outperform traditional machine learning. It seems to understand the context of the cases, which a machine learning interface can’t do, and Fang noted that “large language models can understand and know the context of the report and do more clear classification.” 

These early observations reveal that although LLMs may lack the precision of traditional coding, their human-like approach and expanding applications suggest that their potential is worth exploring further.