World Bank Group: From Chalkboard to Chatbots – Evaluating the Impact of Generative AI on Learning Outcomes in Nigeria
A World Bank working paper finds that using a GPT-4-powered virtual tutor in Nigerian secondary schools significantly boosts English, digital, and AI skills, with stronger gains for higher-performing, female, and higher socioeconomic students. The intervention proved highly cost-effective, equating to 1.5–2 years of traditional schooling and suggesting that scalable AI tutoring can enhance learning in low-resource settings, provided challenges like digital equity are addressed.
Summary of Read" class="text-blue-600 hover:text-blue-800" target="_blank" rel="noopener noreferrer">https://documents1.worldbank.org/curated/en/099548105192529324/pdf/IDU-c09f40d8-9ff8-42dc-b315-591157499be7.pdf'>Read Full Report (PDF)
This is a Policy Research Working Paper from the World Bank's Education Global Department, published in May 2025. Titled "From Chalkboards to Chatbots: Evaluating the Impact of Generative AI on Learning Outcomes in Nigeria," it details a study on the effectiveness of using large language models, specifically Microsoft Copilot powered by GPT-4, as virtual tutors for secondary school students in Nigeria.
The research, conducted through a randomized controlled trial over six weeks, found that the intervention led to significant improvements in English, digital, and AI skills among participating students, particularly female students and those with higher initial academic performance.
The paper emphasizes the cost-effectiveness and scalability of this AI-powered tutoring approach in low-resource settings, although it also highlights the need to address potential inequities in access and digital literacy for broader implementation.
- Significant Positive Impact on Learning Outcomes: The program utilizing Microsoft Copilot (powered by GPT-4) as a virtual tutor in secondary education in Nigeria resulted in a significant improvement of 0.31 standard deviation on an assessment covering English language, artificial intelligence (AI), and digital skills for first-year senior secondary students over six weeks. The effect on English skills, which was the main outcome of interest, was 0.23 standard deviations. These effect sizes are notably high when compared to other randomized controlled trials (RCTs) in low- and middle-income countries.
- High Cost-Effectiveness: The intervention demonstrated substantial learning gains, estimated to be equivalent to 1.5 to 2 years of 'business-as-usual' schooling. A cost-effectiveness analysis revealed that the program ranks among some of the most cost-effective interventions for improving learning outcomes, achieving 3.2 equivalent years of schooling (EYOS) per $100 invested per participant. When considering long-term wage effects, the benefit-cost ratio was estimated to be very high, ranging from 161 to 260.
- Heterogeneous Effects Identified: While the program yielded positive and statistically significant treatment effects across all levels of baseline performance, the effects were found to be stronger among students with better prior academic performance and those from higher socioeconomic backgrounds. Treatment effects were also stronger among female students, which the authors note appeared to compensate for a deficit in their baseline performance.
- Attendance Linked to Greater Gains: A strong linear association was found between the number of days a student attended the intervention sessions and improved learning outcomes. Based on attendance data, the estimated effect size was approximately 0.031 standard deviation per additional day of attendance. Further analysis predicts substantial gains (1.2 to 2.2 standard deviations) for students participating for a full academic year, depending on attendance rates.
- Key Policy Implications for Low-Resource Settings: The findings suggest that AI-powered tutoring using LLMs has transformative potential in the education sector in low-resource settings. Such programs can complement traditional teaching, enhance teacher productivity, and deliver personalized learning, particularly when designed and used properly with guided prompts, teacher oversight, and curriculum alignment. The use of free tools and local staff contributes to scalability, but policymakers must address potential inequities stemming from disparities in digital literacy and technology access through investments in infrastructure, teacher training, and inclusive digital education.
Related Articles
The MCP Context Window Problem: Why AI Agent Architecture Matters More Than Model Size
MCP servers are consuming up to 72% of AI agent context windows before a single user message is processed. Here is why smart agent architecture — not bigger models — is the real solution.
Amazon's AI Coding Crisis Reveals What Every Organization Needs: Controlled Agent Infrastructure
Amazon's recent production outages from AI coding agents reveal a fundamental truth: organizations need AI infrastructure they own and control. Here's what the industry can learn.
Why 1 Million Tokens of Context Changes Everything — If You Own the Infrastructure
Anthropic just made 1 million tokens of context generally available. Here's why long context only matters if the infrastructure running it belongs to you.
What Amazon's AI Coding Agent Outage Teaches Us About Deploying Agents in Production
Amazon's AI coding agent Kiro caused a 13-hour AWS outage by deleting a production environment. The incident reveals why organizations need owned, sandboxed AI infrastructure with proper governance — not just smarter models.
See the ibl.ai AI Operating System in Action
Discover how leading universities and organizations are transforming education with the ibl.ai AI Operating System. Explore real-world implementations from Harvard, MIT, Stanford, and users from 400+ institutions worldwide.
View Case StudiesGet Started with ibl.ai
Choose the plan that fits your needs and start transforming your educational experience today.