World Bank Group: From Chalkboard to Chatbots – Evaluating the Impact of Generative AI on Learning Outcomes in Nigeria
A World Bank working paper finds that using a GPT-4-powered virtual tutor in Nigerian secondary schools significantly boosts English, digital, and AI skills, with stronger gains for higher-performing, female, and higher socioeconomic students. The intervention proved highly cost-effective, equating to 1.5–2 years of traditional schooling and suggesting that scalable AI tutoring can enhance learning in low-resource settings, provided challenges like digital equity are addressed.
Summary of Read Full Report (PDF)
This is a Policy Research Working Paper from the World Bank's Education Global Department, published in May 2025. Titled "From Chalkboards to Chatbots: Evaluating the Impact of Generative AI on Learning Outcomes in Nigeria," it details a study on the effectiveness of using large language models, specifically Microsoft Copilot powered by GPT-4, as virtual tutors for secondary school students in Nigeria.
The research, conducted through a randomized controlled trial over six weeks, found that the intervention led to significant improvements in English, digital, and AI skills among participating students, particularly female students and those with higher initial academic performance.
The paper emphasizes the cost-effectiveness and scalability of this AI-powered tutoring approach in low-resource settings, although it also highlights the need to address potential inequities in access and digital literacy for broader implementation.
- Significant Positive Impact on Learning Outcomes: The program utilizing Microsoft Copilot (powered by GPT-4) as a virtual tutor in secondary education in Nigeria resulted in a significant improvement of 0.31 standard deviation on an assessment covering English language, artificial intelligence (AI), and digital skills for first-year senior secondary students over six weeks. The effect on English skills, which was the main outcome of interest, was 0.23 standard deviations. These effect sizes are notably high when compared to other randomized controlled trials (RCTs) in low- and middle-income countries.
- High Cost-Effectiveness: The intervention demonstrated substantial learning gains, estimated to be equivalent to 1.5 to 2 years of 'business-as-usual' schooling. A cost-effectiveness analysis revealed that the program ranks among some of the most cost-effective interventions for improving learning outcomes, achieving 3.2 equivalent years of schooling (EYOS) per $100 invested per participant. When considering long-term wage effects, the benefit-cost ratio was estimated to be very high, ranging from 161 to 260.
- Heterogeneous Effects Identified: While the program yielded positive and statistically significant treatment effects across all levels of baseline performance, the effects were found to be stronger among students with better prior academic performance and those from higher socioeconomic backgrounds. Treatment effects were also stronger among female students, which the authors note appeared to compensate for a deficit in their baseline performance.
- Attendance Linked to Greater Gains: A strong linear association was found between the number of days a student attended the intervention sessions and improved learning outcomes. Based on attendance data, the estimated effect size was approximately 0.031 standard deviation per additional day of attendance. Further analysis predicts substantial gains (1.2 to 2.2 standard deviations) for students participating for a full academic year, depending on attendance rates.
- Key Policy Implications for Low-Resource Settings: The findings suggest that AI-powered tutoring using LLMs has transformative potential in the education sector in low-resource settings. Such programs can complement traditional teaching, enhance teacher productivity, and deliver personalized learning, particularly when designed and used properly with guided prompts, teacher oversight, and curriculum alignment. The use of free tools and local staff contributes to scalability, but policymakers must address potential inequities stemming from disparities in digital literacy and technology access through investments in infrastructure, teacher training, and inclusive digital education.
Related Articles
Gemini 3.1 Pro and the Case for Model-Agnostic Agentic Infrastructure
Google's Gemini 3.1 Pro doubled its reasoning benchmarks overnight. Here's why that makes model-agnostic agentic infrastructure more critical than ever.
Google Gemini 3.1 Pro, ChatGPT Ads, and Why Organizations Need to Own Their AI Infrastructure
Google launches Gemini 3.1 Pro with advanced reasoning while OpenAI rolls out ads in ChatGPT. These two moves reveal a growing tension in enterprise AI: who controls the intelligence layer, and whose interests does it serve?
ChatGPT Now Has Ads — And It Should Change How You Think About AI Infrastructure
OpenAI has started showing ads inside ChatGPT responses. This marks a turning point: organizations relying on consumer AI tools are now subject to someone else's monetization strategy. Here's why owning your AI infrastructure matters more than ever.
Gemini 3.1 Pro Just Dropped — Here's What It Means for Organizations Running Their Own AI
Google's Gemini 3.1 Pro launched today with 1M-token context, native multimodal reasoning, and agentic tool use. Here's why model releases like this one matter most to organizations that own their AI infrastructure — and why locking into a single provider is the costliest mistake you can make.
See the ibl.ai AI Operating System in Action
Discover how leading universities and organizations are transforming education with the ibl.ai AI Operating System. Explore real-world implementations from Harvard, MIT, Stanford, and users from 400+ institutions worldwide.
View Case StudiesGet Started with ibl.ai
Choose the plan that fits your needs and start transforming your educational experience today.