LLMs Develop Covert Racial Biases Despite Training

A new study reveals large language models exhibit deep-seated racial prejudices against African American English speakers, even after anti-bias training. Models associated negative stereotypes with AAE text, assuming less prestigious jobs, higher crime rates, and harsher punishments compared to Standard English. Larger models displayed greater bias despite better AAE understanding. Current debiasing techniques conceal but don’t eliminate ingrained societal biases models learn, raising concerns over real-world harms if deployed without rigorous auditing.

Home