With the emergence and wide application of cyber technologies, the process of medical informatization has progressed rapidly in recent years. The collection of gene expression data and cyber-enabled tumor risk analysis has matured and is becoming more common. In the case of tumor risk analysis, identification of the distinct genes that contribute the most to the occurrence of tumors has become an increasingly important issue. In this paper, based on gene selection, an improved SSO (Simplified Swarm Optimization) algorithm is developed for data-driven tumor risk analysis that is able to obtain a higher classification accuracy with fewer selected genes. The proposed algorithm is called iSSO-HF&LSS (improved SSO with a hybrid filter and local search strategy) and utilizes information gain and the Pearson correlation coefficient as a hybrid filter method to select a small number of distinct and discriminative genes. Moreover, to select an optimal gene subset, a new local search strategy is applied. The proposed local search strategy selects informative but fewer correlated genes by considering their correlation information. To evaluate the efficiency of the algorithm, a series of experiments is conducted using ten tumor gene expression datasets, and a comparison is made between the performance of this proposed method and nine well-known benchmark classification methods as well as methods used in six referenced studies. As evaluated by several statistical analyses, the proposed method outperforms the existing methods with significant differences and efficiently simplifies the number of gene expression levels.
- Gene selection
- Information gain
- Pearson correlation coefficient
- Simplified swarm optimization
ASJC Scopus subject areas
- Hardware and Architecture
- Computer Networks and Communications