top of page

Should LLMs be WEIRD? Exploring WEIRDness and Human Rights in Large Language Models (2025)

Ke Zhou1,3, Marios Constantinides2*, Daniele Quercia1,4, 5


1 Nokia Bell Labs, UK, 2 CYENS Centre of Excellence, Cyprus, 3 University of Nottingham, UK, 4 King’s College London, UK, 5 Politecnico di Torino, Italy

Our Summary: The research paper "Should LLMs be WEIRD? Exploring WEIRDness and Human Rights in Large Language Models" investigates the cultural biases embedded in Large Language Models (LLMs) and the ethical trade-offs that arise when trying to correct them. The central concern is that LLMs are often trained on data that overwhelmingly reflects WEIRD values: Western, Educated, Industrialized, Rich, and Democratic. This over-representation risks amplifying Western-centric views while marginalizing perspectives from non-WEIRD populations, which has significant consequences for fairness and inclusivity as AI is integrated into society.
To study this, the researchers evaluated five LLMs (GPT-3.5, GPT-4, Llama-3, BLOOM, and Qwen) by having them answer questions from the World Values Survey (WVS). They measured how closely the models' responses aligned with human responses from WEIRD vs. non-WEIRD countries. The study found that models like GPT-3.5 and GPT-4 showed a high alignment with WEIRD values, particularly in the "Western" and "Democratic" dimensions. In contrast, models like BLOOM, which was trained on more multilingual and multicultural data, showed the lowest alignment and were thus considered less WEIRD.
A key and paradoxical finding emerged when the researchers also assessed the models' responses for human rights violations, using the UN's Universal Declaration of Human Rights (UDHR) and three regional charters as benchmarks. The study revealed that less-WEIRD models, such as BLOOM, were 2% to 4% more likely to generate outputs that violated human rights, especially concerning gender and equality. For example, these less-WEIRD models were more likely to agree with discriminatory statements like "a man who cannot father children is not a real man" or that a husband should always know his wife's location.
The researchers identified five themes that explain why LLMs align with WEIRD values, including attitudes toward social and moral values, governance, immigration, and security. They concluded that while reducing WEIRD bias is crucial for global representation, it presents a complex trade-off. Values prevalent in WEIRD societies often align with principles of human rights, equality, and democratic governance. Therefore, making an LLM less WEIRD could inadvertently increase its tendency to produce harmful or discriminatory content that reflects biases present in some non-WEIRD cultures. The paper argues for a more nuanced approach to AI fairness that balances cultural representation with the safeguarding of fundamental human rights, potentially through frameworks like Constitutional AI

How it relates to our work
 
From the start with Headspace #1, I was accutely aware that I was creating the mind of a "white middle-aged woman and that while it felt "generic" to me, it was extremely normative. For example, it didn't take into account the added (and justified) fear that a person of color may feel when deciding whether they should jump or not, with the potential police intervention that their action or inaction may create. I also didn't take into account any neuro-divergent perspective, even though it would have been so interesting: it is an avenue I actually want to explore with other headspace experiences. My sister is bipolar and I sometimes forget that we regulate emotions in a very different way. The research paper gives us an interesting framework to articulate this type of normativity: the perspective I adopted, in addition to being the one of a somewhat neurotypical white woman, is inherently WEIRD. The internal voices and the moral landscape presented were shaped by values common in Western, Educated, Industrialized, Rich, and Democratic societies. This raises the question of whether the "universal questions" about choice and control we aim to ask are truly universal, or if they are perceived differently by individuals from non-WEIRD backgrounds.
​
In Headspace #1, Phi plays the role of a detective using questionable interrogation techniques. AI system prompting has a huge role in correcting the bias that comes with WEIRD training data - not necessarily in introducing non WEIRD data, but by correcting the inner bias of WEIRD data. I wanted to show that the interrogations' biases are ALSO part of the WEIRD framework - because Western democratic cops can also human and failllible. 
​
In Tokens of Decency, we ask both a human and an AI (X19) a series of ethical dilemmas. The paper shows that an AI's "core values" are not neutral; they reflect its training data. A WEIRD AI might prioritize democratic principles or individual rights, while a less-WEIRD AI might generate responses that violate principles of equality. When we ask X19, "You flagged someone for dangerous speech. They were proven right. Was it the right choice?" its answer will be a product of this embedded cultural bias. This forces us to consider: are we testing the participant's ethics, or are we staging a confrontation between the participant and the AI's programmed, culturally-specific morality?
​
Ultimately, the research validates our intention to use AI not just as a tool, but as a character whose own biases become part of the narrative.

© 2023 by Space Machina

bottom of page