University of California Irvine: What Large Language Models Know and What People Think They Know

Jeremy WeaverFebruary 17, 2025

Premium

The study reveals that users tend to overestimate large language models' accuracy due to discrepancies between the models' internal confidence and the users' interpretation, with longer explanations and specific uncertainty language boosting user confidence regardless of actual accuracy. Tailoring LLM responses to better reflect internal uncertainty can help bridge this calibration gap, improving trustworthiness in AI-assisted decisions.

University of California Irvine: What Large Language Models Know and What People Think They Know

https://www.podbean.com/player-v2/?from=embed&i=jhz7c-180691f-pb&square=1&share=1&download=1&fonts=Arial&skin=1&font-color=auto&rtl=0&logo_link=episode_page&btn-skin=7&size=300</a>" loading="lazy" allowfullscreen="">

Summary of Read" class="text-blue-600 hover:text-blue-800" target="_blank" rel="noopener noreferrer">https://www.researchgate.net/publication/388234257_What_large_language_models_know_and_what_people_think_they_know'>Read Full Report

This study investigates how well large language models (LLMs) communicate their uncertainty to users and how human perception aligns with the LLMs' actual confidence. The research identifies a "calibration gap" where users overestimate LLM accuracy, especially with default explanations.

Longer explanations increase user confidence without improving accuracy, indicating shallow processing. By tailoring explanations to reflect the LLM's internal confidence, the study demonstrates a reduction in both the calibration and discrimination gaps, leading to improved user perception of LLM reliability.

The study underscores the importance of transparent uncertainty communication for trustworthy AI-assisted decision-making, advocating for explanations aligned with model confidence.

The study examines how well large language models (LLMs) communicate uncertainty and how humans perceive the accuracy of LLM responses. It identifies gaps between LLM confidence and human confidence, and explores methods to improve user perception of LLM accuracy.

Here are 5 key takeaways:

Calibration and Discrimination Gaps: There's a notable difference between an LLM's internal confidence in its answers and how confident humans are in those same answers. Humans often overestimate the accuracy of LLM responses, and are not good at distinguishing between correct and incorrect answers based on default explanations.
Explanation Length Matters: Longer explanations from LLMs tend to increase user confidence, even if the added length doesn't actually improve the accuracy or informativeness of the answer.
Uncertainty Language Influences Perception: Human confidence is strongly influenced by the type of uncertainty language used in LLM explanations. Low-confidence statements lead to lower human confidence, while high-confidence statements lead to higher human confidence.
Tailoring Explanations Reduces Gaps: By adjusting LLM explanations to better reflect the model's internal confidence, the calibration and discrimination gaps can be narrowed. This improves user perception of LLM accuracy.
Limited User Expertise: Participants in the study generally lacked the expertise to accurately assess LLM responses independently. Even when users altered the LLM's answer, their accuracy was lower than the LLM's.

← PreviousStanford University: The Labor Market Effects of Generative Artificial Intelligence Next →University College London: How Human-AI Feedback Loops Alter Human Perceptual, Emotional and Social Judgements

The MCP Context Window Problem: Why AI Agent Architecture Matters More Than Model Size

MCP servers are consuming up to 72% of AI agent context windows before a single user message is processed. Here is why smart agent architecture — not bigger models — is the real solution.

ibl.aiMarch 16, 2026

Amazon's AI Coding Crisis Reveals What Every Organization Needs: Controlled Agent Infrastructure

Amazon's recent production outages from AI coding agents reveal a fundamental truth: organizations need AI infrastructure they own and control. Here's what the industry can learn.

ibl.aiMarch 15, 2026

Why 1 Million Tokens of Context Changes Everything — If You Own the Infrastructure

Anthropic just made 1 million tokens of context generally available. Here's why long context only matters if the infrastructure running it belongs to you.

ibl.aiMarch 14, 2026

What Amazon's AI Coding Agent Outage Teaches Us About Deploying Agents in Production

Amazon's AI coding agent Kiro caused a 13-hour AWS outage by deleting a production environment. The incident reveals why organizations need owned, sandboxed AI infrastructure with proper governance — not just smarter models.

ibl.aiMarch 13, 2026

See the ibl.ai AI Operating System in Action

Discover how leading universities and organizations are transforming education with the ibl.ai AI Operating System. Explore real-world implementations from Harvard, MIT, Stanford, and users from 400+ institutions worldwide.

View Case Studies

Get Started with ibl.ai

Choose the plan that fits your needs and start transforming your educational experience today.

ibl.ai AI Education Blog

Topics We Cover

Featured Research and Reports

For University Leaders