It has been more than four decades that researchers in psychology and education have been encouraged to report effect sizes (ES) to supplement the findings of null hypothesis significance testing (e.g., Cohen, 1965; Hays, 1963). In recent years, the calculation and interpretation of ESs have been required or highly recommended by refereed journals and professional organizations such as, American Psychological Association (APA) and American Educational Research Association (AERA). For example, the report of APA Task Force on Statistical Inference (hereafter abbreviated as the 1999 APA Task Force Report) stated that
Always present effect sizes [emphasis added] for primary outcomes. If the units of measurement are meaningful on a practical level (e.g., number of cigarettes smoked per day), then we usually prefer an unstandardized measure (regression coefficient or mean difference) to a standardized measure (r or d). It helps to add brief comments that place these effect sizes [emphasis added] in a practical and theoretical context.
…We must stress again that reporting and interpreting effect sizes in the context of previously reported effects is essential to good research [emphasis added]. It enables readers to evaluate the stability of results across samples, designs, and analyses. Reporting effect sizes also informs power analysis and meta-analyses needed in future research. (Wilkinson and the Task Force Report on Statistical Inference, 1999, p. 599).
These cogent words were repeated by similar phrases in the 5th APA Publication Manual (2001), in Table 1 of the Reporting Standards for Research in Psychology, commissioned by APA (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008), and most recently, in the 6th APA Publication Manual (2010, p. 34, and Table 1 of Appendix titled Journal Article Reporting Standards, JARS).
Given this long history of ESs, it is not surprising to note an increase in the ES reporting among quantitative studies published in psychology and education journals (Peng, Chen, Chiang, & Chiang, 2013). Most journals improved the ES reporting rate to exceed 50% after 1999. The three most popular ES measures reported have continued to be the unadjusted R2, Cohen’s d, and η2 (e.g., Alhija & Levy, 2009; Anderson, McCullagh, Wilson, 2007; Harrison et al., 2009; Keselman et al., 1998; Kirk, 1996; Matthews et al., 2008; Smith & Honoré, 2008). When ES was interpreted, Cohen’s (1969) criteria of small, medium, and large effects were often cited without a context, even after 1999 (Peng, Chen, Chiang, & Chiang, 2013).
Are these popular ES indices good enough? Why do researchers report ES? When do they report it and how? This talk intends to address these methodological questions along with a discussion of the following—
• Alternatives ES to Cohen’s d to conceptualize ES beyond standardized mean differences
• Contrast alternative ES with Cohen’s d in purposes, usages, statistical assumptions, statistical properties, interpretability, and the potential for meta-analysis
• A typology of ES for between-subjects designs
• ES for within-subjects designs
• Kelley and Preacher’s definition for ES
References
American Educational Research Association (2006). Standards for reporting on empirical social science research in AERA publications. Educational Researcher, 35(6), 33-40. doi:10.3102/0013189X035006033
American Psychological Association (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author.
APA Publications and Communications Board Working Group on Journal Articles Reporting Standards. (2008). Reporting standards for research in psychology: Why do we need them? What might they be? American Psychologist, 63, 839-851. doi:10.1037/0003-066X.63.9.839
Chen, L.-T., Peng, C.-Y. J., & Chen, M. (2015). Computing tools for implementing standards for single-case designs. Online first in Behavior Modification. doi: 10.1177/0145445515603706.
Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17, 137-152. doi:10.1037/a0028086
Peng, C.-Y. J., & Chen, L.-T. (2015). Examining intervention effects in single-case studies. Submitted to The Journal of Special Education in August, 2015.
Peng, C.-Y. J., & Chen, L.-T. (2015). Algorithms for assessing intervention effects in single-case studies. To appear in Journal of Modern Applied Statistical Methods.
Chen, L.-T., & Peng, C.-Y. J. (2014). The sensitivity of three methods to non-normality and unequal variances in interval estimation of effect sizes. Behavior Research Methods. doi:10.3758/s13428-014-0461-3
Peng, C.-Y. J., & Chen, L.-T. (2014). Beyond Cohen’s d: Alternative effect size measures for between-subject designs. The Journal of Experimental Education, 82(1), 22-50. doi:10.1080/00220973.2012.745471
Peng, C.-Y. J., Chen, L.-T., Chiang, H.-M., & Chiang, Y.-C. (2013). The impact of APA and AERA guidelines on effect size reporting. Educational Psychology Review, 25, 157-209. doi:10.1007/s10648-013-9218-2 [Lead article]
Wilkinson, L., & the APA Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604. doi: 10.1037/0003-066X.54.8.594