--- title: "University of Oxford: Who Should Develop Which AI Evaluations?" slug: "university-of-oxford-who-should-develop-which-ai-evaluations" author: "Jeremy Weaver" date: "2025-02-11 00:21:41" category: "Premium" topics: "Taxonomy of Evaluation Development Approaches, Balancing Conflict of Interest and Expertise, Risk and Method Criteria for Developer Selection, Building a Market-Based Ecosystem for AI Evaluations, Addressing Operational and Ethical Challenges in AI Evaluation Practices" summary: "The memo proposes a framework for assigning AI evaluation development to various actors—government, contractors, third-party organizations, and AI companies—by using four approaches and nine criteria that balance risk, method requirements, and conflicts of interest, while advocating for a market-based ecosystem to support high-quality evaluations." banner: "" thumbnail: "" --- University of Oxford: Who Should Develop Which AI Evaluations?



Summary of Read Full Report (PDF)

This research memo examines the optimal actors for developing AI model evaluations, considering conflicts of interest and expertise requirements. It proposes a taxonomy of four development approaches (government-led, government-contractor collaborations, third-party grants, and direct AI company development) and nine criteria for selecting developers.

The authors suggest a two-step sorting process to identify suitable developers and recommend measures for a market-based ecosystem fostering diverse, high-quality evaluations, emphasizing a balance between public accountability and private-sector efficiency.

The memo also explores challenges like information sensitivity, model access, and the blurred boundaries between evaluation development, execution, and interpretation. Finally, it proposes several strategies for creating a sustainable market for AI model evaluations.

The authors of this document are Lara Thurnherr, Robert Trager, Amin Oueslati, Christoph Winter, Cliodhna Ní Ghuidhir, Joe O'Brien, Jun Shern Chan, Lorenzo Pacchiardi, Anka Reuel, Merlin Stein, Oliver Guest, Oliver Sourbut, Renan Araujo, Seth Donoughe, and Yi Zeng.

Here are five of the most impressive takeaways from the document: