UC San Diego: Large Language Models Pass the Turing Test
Researchers found that GPT-4.5, when adopting a humanlike persona, convinced human interrogators of its humanity more often than real human participants, demonstrating that advanced LLMs can pass the three-party Turing test.
UC San Diego: Large Language Models Pass the Turing Test
Summary of Read Full Report
Researchers Cameron R. Jones and Benjamin K. Bergen investigated whether advanced large language models (LLMs) can pass the standard three-party Turing test. Their study involved human interrogators conversing with both a human and an AI, then judging which was human.
The findings indicate that GPT-4.5, when prompted to adopt a persona, was identified as human significantly more often than the actual human participant, marking the first empirical evidence of an AI passing this rigorous version of the test.
While other models like LLaMa-3.1 showed some human-like qualities, only the persona-prompted models consistently fooled human judges. The study also explored the strategies used by interrogators and the implications of these results for our understanding of AI capabilities and their societal impact.
- This study provides the first empirical evidence that a Large Language Model (LLM), specifically GPT-4.5 when prompted to adopt a humanlike persona, can pass a standard three-party Turing test. In the study, GPT-4.5-PERSONA was judged as human 73% of the time, significantly more often than the actual human participants.
- Prompting with a humanlike persona was crucial for success. While GPT-4.5-PERSONA and LLAMA-PERSONA were not reliably distinguished from humans, the same models without the persona prompt (GPT-4.5-NO-PERSONA and LLAMA-NO-PERSONA) performed significantly worse, often being identified as AI at rates significantly below chance in the undergraduate study.
- The study compared the performance of several AI systems, including GPT-4.5, LLaMa-3.1-405B, GPT-4o, and ELIZA. The baseline models, GPT-4o-NO-PERSONA and ELIZA, had significantly lower win rates, indicating that interrogators could generally distinguish them from humans. This suggests the interrogators were not simply guessing randomly.
- The research indicates that interrogators often relied on social, emotional, and linguistic cues rather than traditional measures of knowledge and reasoning when trying to distinguish between humans and AI. Interestingly, providing strange prompts or using "jailbreaks" was the most effective strategy for interrogators, while asking about the weather or human experiences was least effective.
- The findings have significant social and economic implications, suggesting that contemporary LLMs could potentially substitute for humans in short conversations, raising concerns about deception, misinformation, and the potential undermining of real human interaction. The study also found that general knowledge about LLMs and frequent chatbot interaction did not consistently improve participants' ability to distinguish AI from humans.
Related Articles
Gemini 3.1 Pro and the Case for Model-Agnostic Agentic Infrastructure
Google's Gemini 3.1 Pro doubled its reasoning benchmarks overnight. Here's why that makes model-agnostic agentic infrastructure more critical than ever.
Google Gemini 3.1 Pro, ChatGPT Ads, and Why Organizations Need to Own Their AI Infrastructure
Google launches Gemini 3.1 Pro with advanced reasoning while OpenAI rolls out ads in ChatGPT. These two moves reveal a growing tension in enterprise AI: who controls the intelligence layer, and whose interests does it serve?
ChatGPT Now Has Ads — And It Should Change How You Think About AI Infrastructure
OpenAI has started showing ads inside ChatGPT responses. This marks a turning point: organizations relying on consumer AI tools are now subject to someone else's monetization strategy. Here's why owning your AI infrastructure matters more than ever.
Gemini 3.1 Pro Just Dropped — Here's What It Means for Organizations Running Their Own AI
Google's Gemini 3.1 Pro launched today with 1M-token context, native multimodal reasoning, and agentic tool use. Here's why model releases like this one matter most to organizations that own their AI infrastructure — and why locking into a single provider is the costliest mistake you can make.
See the ibl.ai AI Operating System in Action
Discover how leading universities and organizations are transforming education with the ibl.ai AI Operating System. Explore real-world implementations from Harvard, MIT, Stanford, and users from 400+ institutions worldwide.
View Case StudiesGet Started with ibl.ai
Choose the plan that fits your needs and start transforming your educational experience today.