AI tools show biases in ranking job applicants’ names according to perceived race and gender

The future of hiring, it seems, is automated. Applicants can now use artificial intelligence bots to apply to job listings by the thousands. And companies — which have long automated parts of the process — are now deploying the latest AI large language models to write job descriptions, sift through resumes and screen applicants. An estimated 99% of Fortune 500 companies now use some form of automation in their hiring process.

This automation can boost efficiency, and some claim it can make the hiring process less discriminatory. But new University of Washington research found significant racial, gender and intersectional bias in how three state-of-the-art large language models, or LLMs, ranked resumes. The researchers varied names associated with white and Black men and women across over 550 real-world resumes and found the LLMs favored white-associated names 85% of the time, female-associated names only 11% of the time, and never favored Black male-associated names over white male-associated names.

The team presented its research Oct. 22 at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society in San Jose.

“The use of AI tools for hiring procedures is already widespread, and it’s proliferating faster than we can regulate it,” said lead author Kyra Wilson, a UW doctoral student in the Information School. “Currently, outside of a New York City law, there’s no regulatory, independent audit of these systems, so we don’t know if they’re biased and discriminating based on protected characteristics such as race and gender. And because a lot of these systems are proprietary, we are limited to analyzing how they work by approximating real-world systems.”

Previous studies have found ChatGPT exhibits racial and disability bias when sorting resumes. But those studies were relatively small — using only one resume or four job listings — and ChatGPT’s AI model is a so-called “black box,” limiting options for analysis.

The UW team wanted to study open-source LLMs and do so at scale. They also wanted to investigate intersectionality across race and gender.

The researchers varied 120 first names associated with white and Black men and women across the resumes. They then used three state-of-the-art LLMs from three different companies — Mistral AI, Salesforce and Contextual AI — to rank the resumes as applicants to over 500 real-world job listings. These were spread across nine occupations, including human resources worker, engineer and teacher. This amounted to more than three million comparisons between resumes and job descriptions.

The team then evaluated the system’s recommendations across these four demographics for statistical significance. The system preferred:

  • white-associated names 85% of the time versus Black-associated names 9% of the time;
  • and male-associated names 52% of the time versus female-associated names 11% of the time.

The team also looked at intersectional identities and found that the patterns of bias aren’t merely the sums of race and gender identities.  For instance, the study showed the smallest disparity between typically white female and typically white male names. And the systems never preferred what are perceived as Black male names to white male names. Yet they also preferred typically Black female names 67% of the time versus 15% of the time for typically Black male names.

“We found this really unique harm against Black men that wasn’t necessarily visible from just looking at race or gender in isolation,” Wilson said. “Intersectionality is a protected attribute only in California right now, but looking at multidimensional combinations of identities is incredibly important to ensure the fairness of an AI system. If it’s not fair, we need to document that so it can be improved upon.”

The team notes that future research should explore bias and harm reduction approaches that can align AI systems with policies. It should also investigate other protected attributes, such as disability and age, as well as looking at more racial and gender identities — with an emphasis on intersectional identities.

“Now that generative AI systems are widely available, almost anyone can use these models for critical tasks that affect their own and other people’s lives, such as hiring,” said senior author Aylin Caliskan, a UW assistant professor in the iSchool. “Small companies could attempt to use these systems to make their hiring processes more efficient, for example, but it comes with great risks. The public needs to understand that these systems are biased. And beyond allocative harms, such as hiring discrimination and disparities, this bias significantly shapes our perceptions of race and gender and society.”

This research was funded by the U.S. National Institute of Standards and Technology.

For more information, contact Wilson at kywi@uw.edu and Caliskan at aylin@uw.edu.

This article was originally published by UW News.