Research paper 1: humanity of AI

ON THE HUMANITY OF CONVERSATIONAL AI: EVALUATING THE PSYCHOLOGICAL PORTRAYAL OF LLMS

Jen-tse Huang1∗ , Wenxuan Wang1∗, Eric John Li1, Man Ho Lam1, Shujie Ren3, Youliang Yuan4∗, Wenxiang Jiao2† , Zhaopeng Tu2, Michael R. Lyu1 1Department of Computer Science and Engineering, The Chinese University of Hong Kong 2Tencent AI Lab 3Institute of Psychology, Tianjin Medical University 4School of Data Science, The Chinese University of Hong Kong, Shenzhen {jthuang,wxwang,lyu}@cse.cuhk.edu.hk {ejli,mhlam}@link.cuhk.edu.hk shujieren@tmu.edu.cn {joelwxjiao,zptu}@tencent.com youliangyuan@link.cuhk.edu.cn

This research paper explores the increasingly blurred lines between humans and AI, particularly in how Large Language Models (LLMs) might exhibit human-like psychology. The authors created PsychoBench, a tool that uses well-established psychological tests to assess LLMs in four areas: personality traits, how they relate to others, their motivations, and their emotional abilities. They tested popular models like ChatGPT and GPT-4, even using a "jailbreak" method to bypass AI safety protocols and reveal underlying tendencies.

Key findings:

LLMs showed distinct personality profiles, but often leaned towards helpful and agreeable personalities, likely due to their design as assistants.
They displayed more fairness towards different ethnic groups compared to the average human, probably due to their training to avoid bias.
LLMs seemed more motivated, optimistic, and self-confident than the average person, particularly the advanced GPT-4 model.
They exhibited higher anxiety about relationships than humans, possibly because they are trained on massive text data that might overrepresent anxieties in human communication.
When assigned different roles (like "hero" or "liar"), LLMs' behavior and test results aligned with those roles, demonstrating a degree of role-playing ability.

How it relates to our work:

1. Assigning Psychological Traits to AI Characters

Phi's traits were forged over more than a year of collaboration, feedback and refinements, and were stored in custom instructions, long term memory and RAG. Nevertheless, the successive evolutions of ChatGPT after version 4 made it impossible to maintain the illusion of a permanent personality for Phi's character. It was an interesting experiment that eventually showed its own limits.

2. Use of Role-Playing

The interesting aspect in Headspace #1 is that there were 2 levels of role-playing: the first level with Phi as a creator and actor, and the second level was the characters impersonated by Phi. Each session was always followed by a self critical review by Phi of his performance.

3. Exploring AI Morality and Values

We conceived the Ladder of Life as an experiment and recorded X19's data when using GPT 4o and Claude 3.7. The data is automatically displayed on this site, with a slight delay. In this experience, the AI is supposed to play itself with no steering in values.

Ladder of Life is an ethical experience designed to probe the core values of both humans and an AI participant, X19. The paper's framework provides a formal methodology for the kind of ethical inquiry we are conducting through our installation, lending scientific weight to your exploration of whether an AI can develop a value framework beyond its programming.