Despite no systemic age gap in the workforce, online databases and AI platforms depict women as younger than men across professions, according to a study conducted at UC Berkeley and published in Nature last week.
This bias is clearer in jobs with higher statuses and salaries.Gender and age bias in training datasets for large language models, or LLMs, such as ChatGPT 4.5 could disadvantage older female candidates in AI-screened hiring processes, according to the study.
Assistant Professor Solène Delecourt from UC Berkeley’s Haas School of Business co-authored the paper with Douglas Guilbeault from Stanford’s Graduate School of Business and Bhargav Srinivasa Desikan from the Oxford Internet Institute.
“I think initially I was more curious about why people say, ‘Oh, I met this “girl” at work,’” Delecourt said. “She’s a grown woman — you know?”
Delecourt said the research was motivated by a desire to utilize comprehensive data to challenge stereotypes and investigate anecdotes about gender and age-based workplace discrimination.
The researchers used several large scale open source datasets, including 2018 IMDb-Wiki, 2011 YouTube Faces and the group’s own Google Images database. The study utilized “thousands of people” to manually tag images in the experiment, according to Guilbeault. They found that women were perceived as younger than men in images across professions.
The study found that exposure to Google Image searches increased participants’ perception of women in the workforce as younger than men. It also revealed a strong word association between male-dominated professions and older ages across LLMs.
However, U.S. census data from the past four years showed no overall age discrepancy between male and female employees across industries, according to Guilbeault.
“(These biases) always have played out. We have not found a setting in which they didn’t. That was really striking for us,” Delecourt said.
Guilbeault said the research showed that when asked to generate resumes for female and male candidates for identical positions, ChatGPT assumed female candidates were younger, had graduated more recently and had less work experience than male candidates. ChatGPT also scored resumes from older male candidates higher on average.
These findings from language models, “which are generated by and shared by millions and millions of people,” show gendered ageism on a “culture-wide” scale, Guilbeault said.
He worried that biases in computer vision training datasets used for facial recognition software and for the AI programs used in hiring could create a “self-fulfilling prophecy.”
“They also come from a sort of move-fast-and-break-things mentality of Silicon Valley, where many of these products were launched well before they had been fully evaluated for their biases and their risks,” Guilbeault said.
Delecourt and Guilbeault said awareness is the first step to correct gender-age representational bias and advocated for dialogue between academics and industries on the social welfare implications of new products.
Delecourt said she hopes that other scholars will expand on their research by adding further considerations such as race, adding that “bias is multidimensional.”
“I think research like this is important because it forces people in the industry to have these conversations,” Guilbeault said. “There’s people within these industries who do care, who do want to help these problems.”