Nature: LLMs Proficient Solving & Creating Emotional Intelligence Tests

Jeremy WeaverJune 26, 2025

Premium

A new Nature paper reveals that advanced language models not only surpass human performance on emotional intelligence assessments but can also author psychometrically sound tests of their own.

LLMs Score High on Feelings—Literally

In the newly published Nature article “Large Language Models are Proficient in Solving and Creating Emotional Intelligence Tests,” researchers benchmarked six leading LLMs—ChatGPT-4, ChatGPT-o1, Gemini 1.5 Flash, Copilot 365, Claude 3.5 Haiku, and DeepSeek V3—across five well-validated emotional-intelligence (EI) assessments. The average AI score was 81 %, towering over the historical human mean of 56 %. Two models, ChatGPT-o1 and DeepSeek V3, landed more than two standard deviations above the human average.

Beyond Taking the Test: Writing It

The team then asked ChatGPT-4 to create entirely new EI items—fresh scenarios, answer choices, and scoring keys. When these AI-generated tests were administered to human participants, they matched the original instruments in difficulty and produced strongly correlated results (r = 0.46). Importantly, 88 % of AI-written scenarios were judged “mostly new,” dismissing concerns of mere paraphrase.

Psychometric Quality Holds Up

Minor gaps emerged—slightly lower perceived realism or content diversity—but effect sizes were trivial (Cohen’s d < 0.25). Internal consistency, item discrimination, and cross-test correlations stayed within acceptable bounds. In short, an LLM can now crank out a draft EI assessment good enough to enter a formal validation pipeline.

Implications: Toward AI-Augmented Psychometrics

These findings suggest LLMs possess a form of cognitive empathy—the ability to reason accurately about emotions and regulation strategies. For test developers, this means:

Rapid Item Pool Expansion – Generate hundreds of plausible questions in minutes.
Cost-Efficient Prototyping – Reserve expensive human pilots for fine-tuning, not first drafts.
Domain Coverage – Use AI to ensure balanced representation of emotion-related constructs.

Of course, human experts must still conduct pilot studies, flag cultural bias, and verify reliability before adoption.

What This Means for Learning Platforms

Tools like ibl.ai’s AI Mentor could harness EI-savvy LLMs to coach learners on emotional regulation, feedback reception, or teamwork—skills often harder to teach than math or coding. By embedding AI-generated EI exercises aligned with validated frameworks, platforms can give users consistent, personalized practice while educators supervise with confidence.

Caveats and Future Work

Context Matters – LLMs draw from massive text corpora; domain-specific nuances may still trip them up.
Authenticity vs. Performance – High test scores don’t guarantee genuine emotional resonance in daily interactions.
Ethical Oversight – Any AI-assisted assessment must protect privacy and avoid reinforcing bias.

Researchers call for broader datasets, cross-cultural replication, and longitudinal studies to see how AI-generated EI materials fare over time.

Final Thoughts

The study marks a milestone: large language models aren’t merely linguistic powerhouses—they’re emerging as credible partners in socio-emotional research and education. If guided responsibly, AI could democratize high-quality EI training and assessment, helping humans understand feelings—both theirs and others’—with newfound clarity.

← PreviousMulti-Agent Portfolio Collab with OpenAI Agents SDK Next →Microsoft Education AI Toolkit

Microsoft Education AI Toolkit

Microsoft’s new AI Toolkit guides institutions through a full-cycle journey—exploration, data readiness, pilot design, scaled adoption, and continuous impact review—showing how to deploy AI responsibly for student success and operational efficiency.

Jeremy WeaverJune 30, 2025

Multi-Agent Portfolio Collab with OpenAI Agents SDK

OpenAI’s tutorial shows how a hub-and-spoke agent architecture can transform investment research by orchestrating specialist AI “colleagues” with modular tools and full auditability.

Jeremy WeaverJune 25, 2025

BCG: AI-First Companies Win the Future

BCG’s new report argues that firms built around AI—not merely using it—will widen competitive moats, reshape P&Ls, and scale faster with lean, specialized teams.

Jeremy WeaverJune 23, 2025

McKinsey: Seizing the Agentic AI Advantage

McKinsey’s new report argues that proactive, goal-driven AI agents—supported by an “agentic AI mesh” architecture—can turn scattered pilot projects into transformative, bottom-line results.

Jeremy WeaverJune 23, 2025

See the ibl.ai AI Operating System in Action

Discover how leading universities and organizations are transforming education with the ibl.ai AI Operating System. Explore real-world implementations from Harvard, MIT, Stanford, and users from 400+ institutions worldwide.

View Case Studies

Get Started with ibl.ai

Choose the plan that fits your needs and start transforming your educational experience today.

ibl.ai AI Education Blog

Topics We Cover

Featured Research and Reports

For University Leaders