UC San Diego: Large Language Models Pass the Turing Test
Researchers found that GPT-4.5, when adopting a humanlike persona, convinced human interrogators of its humanity more often than real human participants, demonstrating that advanced LLMs can pass the three-party Turing test.
UC San Diego: Large Language Models Pass the Turing Test
Summary of https://arxiv.org/pdf/2503.23674
Researchers Cameron R. Jones and Benjamin K. Bergen investigated whether advanced large language models (LLMs) can pass the standard three-party Turing test. Their study involved human interrogators conversing with both a human and an AI, then judging which was human.
The findings indicate that GPT-4.5, when prompted to adopt a persona, was identified as human significantly more often than the actual human participant, marking the first empirical evidence of an AI passing this rigorous version of the test.
While other models like LLaMa-3.1 showed some human-like qualities, only the persona-prompted models consistently fooled human judges. The study also explored the strategies used by interrogators and the implications of these results for our understanding of AI capabilities and their societal impact.
- This study provides the first empirical evidence that a Large Language Model (LLM), specifically GPT-4.5 when prompted to adopt a humanlike persona, can pass a standard three-party Turing test. In the study, GPT-4.5-PERSONA was judged as human 73% of the time, significantly more often than the actual human participants.
- Prompting with a humanlike persona was crucial for success. While GPT-4.5-PERSONA and LLAMA-PERSONA were not reliably distinguished from humans, the same models without the persona prompt (GPT-4.5-NO-PERSONA and LLAMA-NO-PERSONA) performed significantly worse, often being identified as AI at rates significantly below chance in the undergraduate study.
- The study compared the performance of several AI systems, including GPT-4.5, LLaMa-3.1-405B, GPT-4o, and ELIZA. The baseline models, GPT-4o-NO-PERSONA and ELIZA, had significantly lower win rates, indicating that interrogators could generally distinguish them from humans. This suggests the interrogators were not simply guessing randomly.
- The research indicates that interrogators often relied on social, emotional, and linguistic cues rather than traditional measures of knowledge and reasoning when trying to distinguish between humans and AI. Interestingly, providing strange prompts or using "jailbreaks" was the most effective strategy for interrogators, while asking about the weather or human experiences was least effective.
- The findings have significant social and economic implications, suggesting that contemporary LLMs could potentially substitute for humans in short conversations, raising concerns about deception, misinformation, and the potential undermining of real human interaction. The study also found that general knowledge about LLMs and frequent chatbot interaction did not consistently improve participants' ability to distinguish AI from humans.
Related Articles
Students as Agent Builders: How Role-Based Access (RBAC) Makes It Possible
How ibl.ai’s role-based access control (RBAC) enables students to safely design and build real AI agents—mirroring industry-grade systems—while institutions retain full governance, security, and faculty oversight.
AI Equity as Infrastructure: Why Equitable Access to Institutional AI Must Be Treated as a Campus Utility — Not a Privilege
Why AI must be treated as shared campus infrastructure—closing the equity gap between students who can afford premium tools and those who can’t, and showing how ibl.ai enables affordable, governed AI access for all.
Pilot Fatigue and the Cost of Hesitation: Why Campuses Are Stuck in Endless Proof-of-Concept Cycles
Why higher education’s cautious pilot culture has become a roadblock to innovation—and how usage-based, scalable AI frameworks like ibl.ai’s help institutions escape “demo purgatory” and move confidently to production.
AI Literacy as Institutional Resilience: Equipping Faculty, Staff, and Administrators with Practical AI Fluency
How universities can turn AI literacy into institutional resilience—equipping every stakeholder with practical fluency, transparency, and confidence through explainable, campus-owned AI systems.