UC San Diego: Large Language Models Pass the Turing Test

Jeremy WeaverApril 4, 2025

Premium

Researchers found that GPT-4.5, when adopting a humanlike persona, convinced human interrogators of its humanity more often than real human participants, demonstrating that advanced LLMs can pass the three-party Turing test.

UC San Diego: Large Language Models Pass the Turing Test

https://www.podbean.com/player-v2/?from=embed&i=hzhap-186fcb8-pb&square=1&share=1&download=1&fonts=Arial&skin=1&font-color=auto&rtl=0&logo_link=episode_page&btn-skin=7&size=300</a>" loading="lazy" allowfullscreen="">

Summary of Read" class="text-blue-600 hover:text-blue-800" target="_blank" rel="noopener noreferrer">https://arxiv.org/pdf/2503.23674'>Read Full Report

Researchers Cameron R. Jones and Benjamin K. Bergen investigated whether advanced large language models (LLMs) can pass the standard three-party Turing test. Their study involved human interrogators conversing with both a human and an AI, then judging which was human.

The findings indicate that GPT-4.5, when prompted to adopt a persona, was identified as human significantly more often than the actual human participant, marking the first empirical evidence of an AI passing this rigorous version of the test.

While other models like LLaMa-3.1 showed some human-like qualities, only the persona-prompted models consistently fooled human judges. The study also explored the strategies used by interrogators and the implications of these results for our understanding of AI capabilities and their societal impact.

This study provides the first empirical evidence that a Large Language Model (LLM), specifically GPT-4.5 when prompted to adopt a humanlike persona, can pass a standard three-party Turing test. In the study, GPT-4.5-PERSONA was judged as human 73% of the time, significantly more often than the actual human participants.
Prompting with a humanlike persona was crucial for success. While GPT-4.5-PERSONA and LLAMA-PERSONA were not reliably distinguished from humans, the same models without the persona prompt (GPT-4.5-NO-PERSONA and LLAMA-NO-PERSONA) performed significantly worse, often being identified as AI at rates significantly below chance in the undergraduate study.
The study compared the performance of several AI systems, including GPT-4.5, LLaMa-3.1-405B, GPT-4o, and ELIZA. The baseline models, GPT-4o-NO-PERSONA and ELIZA, had significantly lower win rates, indicating that interrogators could generally distinguish them from humans. This suggests the interrogators were not simply guessing randomly.
The research indicates that interrogators often relied on social, emotional, and linguistic cues rather than traditional measures of knowledge and reasoning when trying to distinguish between humans and AI. Interestingly, providing strange prompts or using "jailbreaks" was the most effective strategy for interrogators, while asking about the weather or human experiences was least effective.
The findings have significant social and economic implications, suggesting that contemporary LLMs could potentially substitute for humans in short conversations, raising concerns about deception, misinformation, and the potential undermining of real human interaction. The study also found that general knowledge about LLMs and frequent chatbot interaction did not consistently improve participants' ability to distinguish AI from humans.

← PreviousElon University: Being Human in 2035 – How Are We Changing in the Age of AI?Next →Google: Agents Companion

Amazon's AI Agent Outage Is a Warning: Why Organizations Need Governed AI Infrastructure

Amazon's AI coding agent Kiro caused a 13-hour AWS outage by deleting and recreating a production environment. The incident reveals why organizations deploying AI agents need architectural governance — not just more human approvals.

ibl.aiMarch 12, 2026

An AI Agent Hacked McKinsey in 2 Hours — What It Means for Enterprise AI Security

An autonomous AI agent breached McKinsey's internal AI platform in under 2 hours — exposing 46.5 million chat messages and 57,000 employee accounts. Here's what every organization deploying AI needs to learn from it.

ibl.aiMarch 11, 2026

Amazon Now Requires Senior Sign-Off for AI-Generated Code — Here's Why Every Organization Should Take Note

Amazon's new policy requiring senior engineers to approve all AI-assisted code changes signals a turning point: organizations deploying AI agents need governance infrastructure, not just AI capabilities. Here's what it means for the future of agentic systems.

ibl.aiMarch 10, 2026

The Pentagon Blacklisted an AI Company. Here's What It Teaches Every Organization About AI Infrastructure.

When the Pentagon designated Anthropic a 'supply chain risk,' defense contractors scrambled to abandon Claude overnight. The lesson for every organization: if you don't own your AI stack, someone else controls your future.

ibl.aiMarch 9, 2026

See the ibl.ai AI Operating System in Action

Discover how leading universities and organizations are transforming education with the ibl.ai AI Operating System. Explore real-world implementations from Harvard, MIT, Stanford, and users from 400+ institutions worldwide.

View Case Studies

Get Started with ibl.ai

Choose the plan that fits your needs and start transforming your educational experience today.

ibl.ai AI Education Blog

Topics We Cover

Featured Research and Reports

For University Leaders