CROME

Critical Review of Meta-analyses in Education

With the rise of evidence-based education in policy in the past decades, a strong focus has been put in the conduct of randomised controlled trials or quasi-experimental studies. This is accompanied by an increasing number of meta-analyses to quantitatively summarize the results from those studies. Meta-analyses are of particular interest to test the generalisability of research results, but also to identify sources of heterogeneity from apparently conflicting studies. In fact, a study examining a teaching practice may show different results when implemented differently, in different contexts, on different populations, or evaluated through different outcomes (Cheung & Slavin, 2016; Kraft, 2020). Thus, the effect of a teaching practice quantified by a study may not only depend on the teaching practice itself, but also on several methodological features of studies. The extent to which the existing evidence is biased because studies overestimate effects purely methodologically is currently unknown.

In this project, we will conduct a meta-science study aiming at identifying study characteristics (also called moderators) that persistently affect effect sizes in education. We will use a meta-meta-analytic approach to take into account within- and between-meta-analysis heterogeneity of effect sizes (Sterne et al., 2002). We will rerun 247 meta-analyses (including more than 11 000 studies) included in a recent meta-review (Pellegrini et al., 2024), and test if some moderator effects persist between meta-analyses. As few meta-analyses share study-level data (only 6%), this work will need a large amount of data extraction at the study level. We will use Large Language Models to extract information from articles by mimicking the real-world 2-reviewer process with a multi-agent system (Khan et al., 2025, Clark et al., 2025). Accuracy of automatic data extraction will be assessed on a sub-sample of studies, and meta-meta-analyses will include probabilistic sensitivity analyses to take into account for possible misclassification of moderators (Fox et al., 2005). The created dataset will be made openly available in a collaborative platform as a community-augmented meta-analysis for researchers worldwide to contribute by adding new studies, correcting automatically misclassified data, or re-running their own analyses (Kalmendal et al 2025).

This project is conducted in collaboration with:

References:

  • Cheung, A. C. K., & Slavin, R. E. (2016). How Methodological Features Affect Effect Sizes in Education. Educational Researcher, 45(5), 283–292. https://doi.org/10.3102/0013189X16656615
  • Clark, J., Barton, B., Albarqouni, L., Byambasuren, O., Jowsey, T., Keogh, J., Liang, T., Moro, C., O’Neill, H., & Jones, M. (2025). Generative artificial intelligence use in evidence synthesis: A systematic review. Research Synthesis Methods, 1–19. https://doi.org/10.1017/rsm.2025.16
  • Fox, M. P., Lash, T. L., & Greenland, S. (2005). A method to automate probabilistic sensitivity analyses of misclassified binary variables. International Journal of Epidemiology, 34(6), 1370–1376. https://doi.org/10.1093/ije/dyi184
  • Kalmendal, A., Carlsson, R., & Batinovic, L. (2025). Living Review Platform for Educational Interventions: The Evidence in Learning Community Augmented Meta-analysis. PsyArXiv. https://doi.org/10.31234/osf.io/hjku5_v1
  • Khan, M. A., Ayub, U., Naqvi, S. A. A., Khakwani, K. Z. R., Sipra, Z. bin R., Raina, A., Zhou, S., He, H., Saeidi, A., Hasan, B., Rumble, R. B., Bitterman, D. S., Warner, J. L., Zou, J., Tevaarwerk, A. J., Leventakos, K., Kehl, K. L., Palmer, J. M., Murad, M. H., … Riaz, I. bin. (2025). Collaborative large language models for automated data extraction in living systematic reviews. Journal of the American Medical Informatics Association, ocae325. https://doi.org/10.1093/jamia/ocae325
  • Kraft, M. A. (2020). Interpreting Effect Sizes of Education Interventions. Educational Researcher, 49(4), 241–253. https://doi.org/10.3102/0013189X20912798
  • Pellegrini, M., Pigott, T., Chubb, C. S., Day, E., Pruitt, N., & Scarbrough, H. F. (2024). Protocol for a Meta-Review on Education Meta-Analyses: Exploring Methodological Quality and Potential Significance for Research Use in Practice. Nordic Journal of Systematic Reviews in Education, 2. https://doi.org/10.23865/njsre.v2.6169
  • Sterne, J. A. C., Jüni, P., Schulz, K. F., Altman, D. G., Bartlett, C., & Egger, M. (2002). Statistical methods for assessing the influence of study characteristics on treatment effects in “meta-epidemiological” research. Statistics in Medicine, 21(11), 1513–1524. https://doi.org/10.1002/sim.1184