University of Washington research found significant racial, gender and intersectional bias in how three state-of-the-art large language models, or LLMs, ranked resumes.
The future of hiring, it seems, is automated. Applicants can now . And companies – which have long automated parts of the process – are now to write job descriptions, sift through resumes and screen applicants. An estimated 99% of Fortune 500 companies now .
This automation can boost efficiency, and some claim it can make the hiring process less discriminatory. But new University of Washington research found significant racial, gender and intersectional bias in how three state-of-the-art large language models, or LLMs, ranked resumes. The researchers varied names associated with white and Black men and women across over 550 real-world resumes and found the LLMs favored white-associated names 85% of the time, female-associated names only 11% of the time, and never favored Black male-associated names over white male-associated names.
The team Oct. 22 at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society in San Jose.
“The use of AI tools for hiring procedures is already widespread, and it’s proliferating faster than we can regulate it,” said lead author , a UW doctoral student in the Information School. “Currently, outside of , there’s no regulatory, independent audit of these systems, so we don’t know if they’re biased and discriminating based on protected characteristics such as race and gender. And because a lot of these systems are proprietary, we are limited to analyzing how they work by approximating real-world systems.”
Previous studies have found and when sorting resumes. But those studies were relatively small – using only one resume or four job listings – and ChatGPT’s AI model is a so-called “black box,” limiting options for analysis.
Related:
The UW team wanted to study open-source LLMs and do so at scale. They also wanted to investigate intersectionality across race and gender.
The researchers varied 120 first names associated with white and Black men and women across the resumes. They then used three state-of-the-art LLMs from three different companies – Mistral AI, Salesforce and Contextual AI – to rank the resumes as applicants to 500 real-world job listings. These were spread across nine occupations, including human resources worker, engineer and teacher. This amounted to more than three million comparisons between resumes and job descriptions.
The team then evaluated the system’s recommendations across these four demographics for statistical significance. The system preferred:
- white-associated names 85% of the time versus Black-associated names 9% of the time;
- and male-associated names 52% of the time versus female-associated names 11% of the time.
The team also looked at intersectional identities and found that the patterns of bias aren’t merely the sums of race and gender identities. For instance, the study showed the smallest disparity between typically white female and typically white male names. And the systems never preferred what are perceived as Black male names to white male names. Yet they also preferred typically Black female names 67% of the time versus 15% of the time for typically Black male names.
“We found this really unique harm against Black men that wasn’t necessarily visible from just looking at race or gender in isolation,” Wilson said. “Intersectionality is a protected attribute only in California right now, but looking at multidimensional combinations of identities is incredibly important to ensure the fairness of an AI system. If it’s not fair, we need to document that so it can be improved upon.”
The team notes that future research should explore bias and harm reduction approaches that can align AI systems with policies. It should also investigate other protected attributes, such as disability and age, as well as looking at more racial and gender identities – with an emphasis on intersectional identities.
“Now that generative AI systems are widely available, almost anyone can use these models for critical tasks that affect their own and other people’s lives, such as hiring,” said senior author , a UW assistant professor in the iSchool. “Small companies could attempt to use these systems to make their hiring processes more efficient, for example, but it comes with great risks. The public needs to understand that these systems are biased. And beyond allocative harms, such as hiring discrimination and disparities, this bias significantly shapes our perceptions of race and gender and society.”
This research was funded by the U.S. ³Ô¹ÏÍøÕ¾ Institute of Standards and Technology.