CROME

Critical Review of Meta-analyses in Education

With the rise of evidence-based education in policy in the past decades, a strong focus has been put in the conduct of randomised controlled trials or quasi-experimental studies. This is accompanied by an increasing number of meta-analyses to quantitatively summarize the results from those studies. Meta-analyses are of particular interest to test the generalisability of research results, but also to identify sources of heterogeneity from apparently conflicting studies. In fact, a study examining a teaching practice may show different results when implemented differently, in different contexts, on different populations, or evaluated through different outcomes (Cheung & Slavin, 2016; Kraft, 2020). Thus, the effect of a teaching practice quantified by a study may not only depend on the teaching practice itself, but also on several methodological features of studies. The extent to which the existing evidence is biased because studies overestimate effects purely methodologically is currently unknown.

In this project, we will conduct a meta-science study aiming at identifying study characteristics (also called moderators) that persistently affect effect sizes in education. We will use a meta-meta-analytic approach to take into account within- and between-meta-analysis heterogeneity of effect sizes (Sterne et al., 2002). We will rerun 247 meta-analyses (including more than 11 000 studies) included in a recent meta-review (Pellegrini et al., 2024), and test if some moderator effects persist between meta-analyses. As few meta-analyses share study-level data (only 6%), this work will need a large amount of data extraction at the study level. We will use Large Language Models to extract information from articles by mimicking the real-world 2-reviewer process with a multi-agent system (Khan et al., 2025, Clark et al., 2025). Accuracy of automatic data extraction will be assessed on a sub-sample of studies, and meta-meta-analyses will include probabilistic sensitivity analyses to take into account for possible misclassification of moderators (Fox et al., 2005). The created dataset will be made openly available in a collaborative platform as a community-augmented meta-analysis for researchers worldwide to contribute by adding new studies, correcting automatically misclassified data, or re-running their own analyses (Kalmendal et al 2025).

This project is conducted in collaboration with:

Pascal Bressoux, Laurent Lima and Pascal Pansu (LaRAC, Unviersité Grenoble Alpes)
Terri Pigott (College of Education and Human Development, Georgia State University)
Dimitris Mavridis (Department of Primary Education, University of Ioannina)

References:

Cheung, A. C. K., & Slavin, R. E. (2016). How Methodological Features Affect Effect Sizes in Education. Educational Researcher, 45(5), 283–292. https://doi.org/10.3102/0013189X16656615
Clark, J., Barton, B., Albarqouni, L., Byambasuren, O., Jowsey, T., Keogh, J., Liang, T., Moro, C., O’Neill, H., & Jones, M. (2025). Generative artificial intelligence use in evidence synthesis: A systematic review. Research Synthesis Methods, 1–19. https://doi.org/10.1017/rsm.2025.16
Fox, M. P., Lash, T. L., & Greenland, S. (2005). A method to automate probabilistic sensitivity analyses of misclassified binary variables. International Journal of Epidemiology, 34(6), 1370–1376. https://doi.org/10.1093/ije/dyi184
Kalmendal, A., Carlsson, R., & Batinovic, L. (2025). Living Review Platform for Educational Interventions: The Evidence in Learning Community Augmented Meta-analysis. PsyArXiv. https://doi.org/10.31234/osf.io/hjku5_v1
Khan, M. A., Ayub, U., Naqvi, S. A. A., Khakwani, K. Z. R., Sipra, Z. bin R., Raina, A., Zhou, S., He, H., Saeidi, A., Hasan, B., Rumble, R. B., Bitterman, D. S., Warner, J. L., Zou, J., Tevaarwerk, A. J., Leventakos, K., Kehl, K. L., Palmer, J. M., Murad, M. H., … Riaz, I. bin. (2025). Collaborative large language models for automated data extraction in living systematic reviews. Journal of the American Medical Informatics Association, ocae325. https://doi.org/10.1093/jamia/ocae325
Kraft, M. A. (2020). Interpreting Effect Sizes of Education Interventions. Educational Researcher, 49(4), 241–253. https://doi.org/10.3102/0013189X20912798
Pellegrini, M., Pigott, T., Chubb, C. S., Day, E., Pruitt, N., & Scarbrough, H. F. (2024). Protocol for a Meta-Review on Education Meta-Analyses: Exploring Methodological Quality and Potential Significance for Research Use in Practice. Nordic Journal of Systematic Reviews in Education, 2. https://doi.org/10.23865/njsre.v2.6169
Sterne, J. A. C., Jüni, P., Schulz, K. F., Altman, D. G., Bartlett, C., & Egger, M. (2002). Statistical methods for assessing the influence of study characteristics on treatment effects in “meta-epidemiological” research. Statistics in Medicine, 21(11), 1513–1524. https://doi.org/10.1002/sim.1184

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Ignacio Atal

Critical Review of Meta-analyses in Education

Share on