Rethinking psychological measurement validity

Authors

Wendy Higgins

Affiliation: University of Melbourne

Category: Philosophy

Keywords: construct validity, psychological measurement, causality, validity potential, realised validity

Schedule & Location

Date: Wednesday 3rd of September

Time: 18:30

Location: GSSR Plenary Hall (268)

View the full session: Methodology

Abstract

Valid measurement is critical for empirical research. Yet, a growing body of literature suggests that validation practices in the psychological sciences are highly inadequate (e.g., Alexandrova & Haybron, 2016; Flake & Fried, 2020; Fried et al., 2022; Higgins et al., 2024; Schimmack, 2021). A lack of clarity about the concept of validity itself has been proposed as a contributing factor to poor validation practices (e.g., Borsboom et al., 2004; Cizek, 2012). In this paper, I introduce a concept of measurement validity with three key features that, I argue, can help improve validation practices in the psychological sciences.

The first feature of my proposed concept of validity is that it is restricted to the question of what a test measures. This aligns with the intuitive and widespread view that a valid test is one that measures what it is intended to measure (Borsboom et al., 2004; Furr, 2022; Kelley, 1927). However, this contrasts with the concept of validity underpinning influential guidelines from the American Psychological Association (APA), which state that validity should be ascribed to interpretations of test scores for different test uses (American Educational Research Association et al., 2014; Applebaum et al., 2018; Sireci & Sukin, 2013). There are at least two benefits of adopting a measurement-focused concept of validity. First, under a measurement-specify concept of validity, ascriptions of validity will always provide information about what a test measures. By contrast, under the APA’s broader concept of validity, validity can be ascribed irrespective of whether a test measures what it is intended to measure. Second, a measurement-focused concept of validity can facilitate clearer validation guidelines. As an illustration, I consider the relationship between internal reliability and validity. Higher levels of internal reliability can increase the validity of test scores used for measurement while decreasing the validity of test scores used to predict an outcome. I argue that, under the broad concept of validity, this has led to confusion about internal reliability’s role in valid measurement.

The second feature of my proposed concept of validity is a necessary and sufficient causal criterion for valid measurement. This criterion builds on Borsboom and colleagues’ (2004) argument that a causal relationship from the attribute being measured to the measurement outcomes (e.g., test scores) is necessary and sufficient for valid measurement. However, to address concerns that a test should not be considered a valid measure of every factor that exerts some causal influence on measurement outcomes, regardless of the strength of that influence (Eronen, 2024; Larroulet Philippi, 2021), I propose an additional requirement: to satisfy the criterion for valid measurement, the causal relationship must be of a certain strength (which can vary across contexts). I will briefly defend the causal criterion for valid measurement and discuss key implications for psychological research practices.

The third feature of my proposed concept of validity is that it comprises two distinct components: the validity potential of a measurement procedure (e.g., a personality test) and the realised validity of specific measurement outcomes from that measurement procedure (e.g., personality test scores collected in a study). These two components of validity map onto different stages of the research process: validity potential is critical during the design phase of a study when researchers select measures to use, and realised validity is critical when inferences are made based on the specific test scores collected in a study. I argue that making an explicit distinction between these two components of measurement validity can facilitate clearer guidelines for psychological measurement validation and encourage more frequent validity evidence reporting.

In summary, I argue that a concept of validity that (1) explicitly limits validity to measurement, (2) emphasises the necessity of a causal relationship from attributes to measurement outcomes, and (3) distinguishes validity potential from realised validity can facilitate better measurement validation practices, and, thus, increase the credibility of psychological research.

References

Alexandrova, A., & Haybron, D. M. (2016). Is Construct Validation Valid? Philosophy of science, 83(5), 1098-1109. https://doi.org/10.1086/687941

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.

Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. American Psychologist, 73(1), 3.

Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The Concept of Validity. Psychological review, 111(4), 1061-1071. https://doi.org/10.1037/0033-295X.111.4.1061

Cizek, G. J. (2012). Defining and Distinguishing Validity: Interpretations of Score Meaning and Justifications of Test Use. Psychological methods, 17(1), 31-43. https://doi.org/10.1037/a0026975

Eronen, M. I. (2024). Causal complexity and psychological measurement. Philosophical psychology, 1-16. https://doi.org/10.1080/09515089.2023.2300693

Flake, J. K., & Fried, E. I. (2020). Measurement schmeasurement: Questionable measurement practices and how to avoid them. Advances in Methods and Practices in Psychological Science, 3(4), 456-465. https://doi.org/https://doi.org/10.1177/2515245920952393

Fried, E. I., Flake, J. K., & Robinaugh, D. J. (2022). Revisiting the theoretical and methodological foundations of depression measurement. Nature Reviews Psychology, 1(6), 358-368. https://doi.org/10.1038/s44159-022-00050-2

Furr, R. M. (2022). Psychometrics : an introduction (4th ed. ed.). SAGE.

Higgins, W. C., Kaplan, D. M., Deschrijver, E., & Ross, R. M. (2024a). Construct validity evidence reporting practices for the Reading the Mind in the Eyes Test: A systematic scoping review. Clinical Psychology Review, 108, 102378. https://doi.org/https://doi.org/10.1016/j.cpr.2023.102378

Kelley, T. L. (1927). Interpretation of educational measurements. World Book Company.

Larroulet Philippi, C. (2021). Valid for What? On the Very Idea of Unconditional Validity. Philosophy of the social sciences, 51(2), 151-175. https://doi.org/10.1177/0048393120971169

Schimmack, U. (2021). The validation crisis in psychology. Meta-Psychology, 5.

Sireci S. G., Sukin T. (2013). Test validity. In Geisinger K. F. (Ed.), Test Theory and Testing Assessment in Industrial and Organizational Psychology, Vol. 1. APA handbook of testing and assessment in psychology (pp. 61-84). Washington, DC, US: American Psychological Association.