University of California Irvine: What Large Language Models Know and What People Think They Know
The study reveals that users tend to overestimate large language models' accuracy due to discrepancies between the models' internal confidence and the users' interpretation, with longer explanations and specific uncertainty language boosting user confidence regardless of actual accuracy. Tailoring LLM responses to better reflect internal uncertainty can help bridge this calibration gap, improving trustworthiness in AI-assisted decisions.
University of California Irvine: What Large Language Models Know and What People Think They Know
This study investigates how well large language models (LLMs) communicate their uncertainty to users and how human perception aligns with the LLMs' actual confidence. The research identifies a "calibration gap" where users overestimate LLM accuracy, especially with default explanations.
Longer explanations increase user confidence without improving accuracy, indicating shallow processing. By tailoring explanations to reflect the LLM's internal confidence, the study demonstrates a reduction in both the calibration and discrimination gaps, leading to improved user perception of LLM reliability.
The study underscores the importance of transparent uncertainty communication for trustworthy AI-assisted decision-making, advocating for explanations aligned with model confidence.
The study examines how well large language models (LLMs) communicate uncertainty and how humans perceive the accuracy of LLM responses. It identifies gaps between LLM confidence and human confidence, and explores methods to improve user perception of LLM accuracy.
Here are 5 key takeaways:
- Calibration and Discrimination Gaps: There's a notable difference between an LLM's internal confidence in its answers and how confident humans are in those same answers. Humans often overestimate the accuracy of LLM responses, and are not good at distinguishing between correct and incorrect answers based on default explanations.
- Explanation Length Matters: Longer explanations from LLMs tend to increase user confidence, even if the added length doesn't actually improve the accuracy or informativeness of the answer.
- Uncertainty Language Influences Perception: Human confidence is strongly influenced by the type of uncertainty language used in LLM explanations. Low-confidence statements lead to lower human confidence, while high-confidence statements lead to higher human confidence.
- Tailoring Explanations Reduces Gaps: By adjusting LLM explanations to better reflect the model's internal confidence, the calibration and discrimination gaps can be narrowed. This improves user perception of LLM accuracy.
- Limited User Expertise: Participants in the study generally lacked the expertise to accurately assess LLM responses independently. Even when users altered the LLM's answer, their accuracy was lower than the LLM's.
Related Articles
Students as Agent Builders: How Role-Based Access (RBAC) Makes It Possible
How ibl.ai’s role-based access control (RBAC) enables students to safely design and build real AI agents—mirroring industry-grade systems—while institutions retain full governance, security, and faculty oversight.
AI Equity as Infrastructure: Why Equitable Access to Institutional AI Must Be Treated as a Campus Utility — Not a Privilege
Why AI must be treated as shared campus infrastructure—closing the equity gap between students who can afford premium tools and those who can’t, and showing how ibl.ai enables affordable, governed AI access for all.
Pilot Fatigue and the Cost of Hesitation: Why Campuses Are Stuck in Endless Proof-of-Concept Cycles
Why higher education’s cautious pilot culture has become a roadblock to innovation—and how usage-based, scalable AI frameworks like ibl.ai’s help institutions escape “demo purgatory” and move confidently to production.
AI Literacy as Institutional Resilience: Equipping Faculty, Staff, and Administrators with Practical AI Fluency
How universities can turn AI literacy into institutional resilience—equipping every stakeholder with practical fluency, transparency, and confidence through explainable, campus-owned AI systems.