An improved method using k-means to determine the optimal number of clusters, considering the relations between several variables

Hideki Toyoda, Kazuya Ikehara

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In this article, we propose a non-hierarchical clustering method that can consider the relations between several variables and determine the optimal number of clusters. By utilizing the Mahalanobis distance instead of the Euclidean distance, which is calculated in k-means, we could consider the relations between several variables and obtain better groupings. Assuming that the data are samples from a mixture normal distribution, we could also calculate Akaike's information criterion (AIC) and the Bayesian information criterion (BIC) to determine the number of clusters. We used simulation and real data examples to confirm the usefulness of the proposed method. This method allows determination of the optimal number of clusters, considering the relations between several variables.

Original languageEnglish
Pages (from-to)32-40
Number of pages9
JournalShinrigaku Kenkyu
Volume82
Issue number1
Publication statusPublished - 2011 Apr

Fingerprint

Normal Distribution
Cluster Analysis

Keywords

  • Clustering
  • Mahalanobis distance
  • Mixture distribution
  • Number of clusters

ASJC Scopus subject areas

  • Psychology(all)

Cite this

An improved method using k-means to determine the optimal number of clusters, considering the relations between several variables. / Toyoda, Hideki; Ikehara, Kazuya.

In: Shinrigaku Kenkyu, Vol. 82, No. 1, 04.2011, p. 32-40.

Research output: Contribution to journalArticle

@article{63563c00a7b042b49eb8c599f92c60fe,
title = "An improved method using k-means to determine the optimal number of clusters, considering the relations between several variables",
abstract = "In this article, we propose a non-hierarchical clustering method that can consider the relations between several variables and determine the optimal number of clusters. By utilizing the Mahalanobis distance instead of the Euclidean distance, which is calculated in k-means, we could consider the relations between several variables and obtain better groupings. Assuming that the data are samples from a mixture normal distribution, we could also calculate Akaike's information criterion (AIC) and the Bayesian information criterion (BIC) to determine the number of clusters. We used simulation and real data examples to confirm the usefulness of the proposed method. This method allows determination of the optimal number of clusters, considering the relations between several variables.",
keywords = "Clustering, Mahalanobis distance, Mixture distribution, Number of clusters",
author = "Hideki Toyoda and Kazuya Ikehara",
year = "2011",
month = "4",
language = "English",
volume = "82",
pages = "32--40",
journal = "Shinrigaku Kenkyu",
issn = "0021-5236",
publisher = "Japanese Psychological Association",
number = "1",

}

TY - JOUR

T1 - An improved method using k-means to determine the optimal number of clusters, considering the relations between several variables

AU - Toyoda, Hideki

AU - Ikehara, Kazuya

PY - 2011/4

Y1 - 2011/4

N2 - In this article, we propose a non-hierarchical clustering method that can consider the relations between several variables and determine the optimal number of clusters. By utilizing the Mahalanobis distance instead of the Euclidean distance, which is calculated in k-means, we could consider the relations between several variables and obtain better groupings. Assuming that the data are samples from a mixture normal distribution, we could also calculate Akaike's information criterion (AIC) and the Bayesian information criterion (BIC) to determine the number of clusters. We used simulation and real data examples to confirm the usefulness of the proposed method. This method allows determination of the optimal number of clusters, considering the relations between several variables.

AB - In this article, we propose a non-hierarchical clustering method that can consider the relations between several variables and determine the optimal number of clusters. By utilizing the Mahalanobis distance instead of the Euclidean distance, which is calculated in k-means, we could consider the relations between several variables and obtain better groupings. Assuming that the data are samples from a mixture normal distribution, we could also calculate Akaike's information criterion (AIC) and the Bayesian information criterion (BIC) to determine the number of clusters. We used simulation and real data examples to confirm the usefulness of the proposed method. This method allows determination of the optimal number of clusters, considering the relations between several variables.

KW - Clustering

KW - Mahalanobis distance

KW - Mixture distribution

KW - Number of clusters

UR - http://www.scopus.com/inward/record.url?scp=79960998927&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79960998927&partnerID=8YFLogxK

M3 - Article

VL - 82

SP - 32

EP - 40

JO - Shinrigaku Kenkyu

JF - Shinrigaku Kenkyu

SN - 0021-5236

IS - 1

ER -