Decline of Pearson’s r with categorization of variables: a large-scale simulation

Takahiro Onoshima, Kenpei Shiina, Takashi Ueda, Saori Kubo

Research output: Contribution to journalArticle

Abstract

It is often said that correlation coefficients computed from categorical variables are biased and thus should not be used. However, practitioners often ignore this longstanding caveat from statisticians. Although some studies have examined the bias, the true extent is still unknown. This study is an extensive attempt to determine the range and degree of the biases. In our simulation, continuous variables were categorized according to various thresholds and used to compute Pearson’s r. The results indicated that there were more serious biases than highlighted in previous studies. The results also revealed that increasing data size did not reduce the biases. Possible ways to cope with the biases are discussed.

Original languageEnglish
Pages (from-to)389-399
Number of pages11
JournalBehaviormetrika
Volume46
Issue number2
DOIs
Publication statusPublished - 2019 Oct 1

Keywords

  • Categorization bias
  • Correlation coefficient
  • Likert scale
  • Number of categories

ASJC Scopus subject areas

  • Analysis
  • Applied Mathematics
  • Clinical Psychology
  • Experimental and Cognitive Psychology

Fingerprint Dive into the research topics of 'Decline of Pearson’s r with categorization of variables: a large-scale simulation'. Together they form a unique fingerprint.

  • Cite this