Stanford University: Predicting Long-Term Student Outcomes from Short-Term EdTech Log Data
Short-term educational technology log data (2–5 hours of use) can effectively predict long-term student outcomes, showing similar performance to models using full-period data. Key features like success rates and average attempts per problem are strong predictors, especially at performance extremes, and combining these log features with pre-assessment scores further enhances prediction accuracy.
Summary of https://arxiv.org/pdf/2412.15473
Investigates whether student log data from educational technology, specifically from the first few hours of use, can predict long-term student outcomes like end-of-year external assessments.
Using data from a literacy game in Uganda and two math tutoring systems in the US, the researchers explore if machine learning models trained on this short-term data can effectively predict performance.
They examine the accuracy of different machine learning algorithms and identify some common predictive features across the diverse datasets. Additionally, the study analyzes the prediction quality for different student performance levels and the impact of including pre-assessment scores in the models.
- Short-term log data (2-5 hours) can effectively predict long-term outcomes. The study found that machine learning models using data from a student's first few hours of usage with educational technology provided a useful predictor of end-of-school year external assessments, with performance similar to models using data from the entire usage period (multi-month). This finding was consistent across three diverse datasets from different educational contexts and tools. Interestingly, performance did not always improve monotonically with longer horizon data; in some cases, accuracy estimates were higher using a shorter horizon.
- Certain log data features are consistently important predictors across different tools. Features like the percentage of success problems and the average number of attempts per problem were frequently selected as important features by the random forest model across all three datasets and both short and full horizons. This suggests that these basic counting features, which are generally obtainable from log data across many educational platforms, are valuable signals for predicting long-term performance.
- While not perfectly accurate for individual students, the models show good precision at predicting performance extremes. The models struggled to accurately predict students in the middle performance quintiles but showed relatively high precision when predicting students in the lowest (likely to struggle) or highest (likely to thrive) performance groups. For instance, the best model for CWTLReading was accurate 77% of the time when predicting someone would be in the lowest performance quintile (Q1) and 72% accurate for predicting the highest (Q5). This suggests potential for using these predictions to identify students who might benefit from additional support or challenges.
- Using a set of features generally outperforms using a single feature. While single features like percentage success or average attempts per problem still perform better than a baseline, machine learning models trained on the full set of extracted log features generally outperformed models using only a single feature. This indicates that considering multiple aspects of student interaction captured in the log data provides additional predictive power.
- Pre-assessment scores are powerful indicators and can be combined with log data for enhanced prediction.Pre-test or pre-assessment scores alone were found to be strong predictors for long-term outcomes, often outperforming using log data features alone. When available, combining pre-test scores with log data features generally resulted in improved prediction performance (higher R2 values) compared to using either source of data alone. However, the study notes that short-horizon log data can be a useful tool for prediction when pre-tests are not available or take time away from instruction.
Related Articles
Students as Agent Builders: How Role-Based Access (RBAC) Makes It Possible
How ibl.ai’s role-based access control (RBAC) enables students to safely design and build real AI agents—mirroring industry-grade systems—while institutions retain full governance, security, and faculty oversight.
AI Equity as Infrastructure: Why Equitable Access to Institutional AI Must Be Treated as a Campus Utility — Not a Privilege
Why AI must be treated as shared campus infrastructure—closing the equity gap between students who can afford premium tools and those who can’t, and showing how ibl.ai enables affordable, governed AI access for all.
Pilot Fatigue and the Cost of Hesitation: Why Campuses Are Stuck in Endless Proof-of-Concept Cycles
Why higher education’s cautious pilot culture has become a roadblock to innovation—and how usage-based, scalable AI frameworks like ibl.ai’s help institutions escape “demo purgatory” and move confidently to production.
AI Literacy as Institutional Resilience: Equipping Faculty, Staff, and Administrators with Practical AI Fluency
How universities can turn AI literacy into institutional resilience—equipping every stakeholder with practical fluency, transparency, and confidence through explainable, campus-owned AI systems.