Genetic network programming with parallel processing for association rule mining in large and dense databases

Eloy Gonzales, Kaoru Shimada, Shingo Mabu, Kotaro Hirasawa, Takayuki Furuzuki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Several methods of extracting association rules have been reported. A new evolutionary computation method named Genetic Network Programming (GNP) has also been developed recently and its efectiveness is shown for small datasets. However, it has not been tested for large datasets, particularly in datasets with a large number of attributes. The aim of this paper is to extract association rules from large and dense datasets using GNP considering a real world database with a huge number of attributes. We propose a new method where a large database is divided into many small datasets, then each GNP deals with one dataset having attributes with appropiate size, which was selected randomly from a large dataset and generated genetically. These GNPs are processed in parallel. We then propose some new genetic operations to improve the number of rules extracted and their quality as well. The proposed method improves remarkably on simulations. Fig. 1 shows the architecture of the proposed method. We use the CLIENT/SERVER model. CLIENT side carries out preprocessing of large database, assignment of files to each server, rule checking, and genetic operations on files. SERVER side carries out processing of each file using conventional GNP based mining method independently. The features and advantages of the proposed method are the following: Rule extraction is done in parallel. Each file generates its local pool of the rules. Files or datasets are treated as individuals in order to do new genetic operations over them and improve the rule extraction. Extracted rules are stored in a global pool. The rules are verified to avoid redundancy among them and it is assured that only new rules are stored.

Original languageEnglish
Title of host publicationProceedings of GECCO 2007: Genetic and Evolutionary Computation Conference
Pages1512
Number of pages1
DOIs
Publication statusPublished - 2007
Event9th Annual Genetic and Evolutionary Computation Conference, GECCO 2007 - London
Duration: 2007 Jul 72007 Jul 11

Other

Other9th Annual Genetic and Evolutionary Computation Conference, GECCO 2007
CityLondon
Period07/7/707/7/11

Fingerprint

Network Programming
Genetic Network
Association Rule Mining
Association rules
Parallel Processing
Genetic Programming
Processing
Rule Extraction
Evolutionary algorithms
Attribute
Association Rules
Redundancy
Large Data Sets
Servers
Evolutionary Computation
Preprocessing
Mining
Assignment
Server

Keywords

  • Association rules
  • Genetic network programming
  • Parallel processing

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Theoretical Computer Science

Cite this

Gonzales, E., Shimada, K., Mabu, S., Hirasawa, K., & Furuzuki, T. (2007). Genetic network programming with parallel processing for association rule mining in large and dense databases. In Proceedings of GECCO 2007: Genetic and Evolutionary Computation Conference (pp. 1512) https://doi.org/10.1145/1276958.1277241

Genetic network programming with parallel processing for association rule mining in large and dense databases. / Gonzales, Eloy; Shimada, Kaoru; Mabu, Shingo; Hirasawa, Kotaro; Furuzuki, Takayuki.

Proceedings of GECCO 2007: Genetic and Evolutionary Computation Conference. 2007. p. 1512.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gonzales, E, Shimada, K, Mabu, S, Hirasawa, K & Furuzuki, T 2007, Genetic network programming with parallel processing for association rule mining in large and dense databases. in Proceedings of GECCO 2007: Genetic and Evolutionary Computation Conference. pp. 1512, 9th Annual Genetic and Evolutionary Computation Conference, GECCO 2007, London, 07/7/7. https://doi.org/10.1145/1276958.1277241
Gonzales E, Shimada K, Mabu S, Hirasawa K, Furuzuki T. Genetic network programming with parallel processing for association rule mining in large and dense databases. In Proceedings of GECCO 2007: Genetic and Evolutionary Computation Conference. 2007. p. 1512 https://doi.org/10.1145/1276958.1277241
Gonzales, Eloy ; Shimada, Kaoru ; Mabu, Shingo ; Hirasawa, Kotaro ; Furuzuki, Takayuki. / Genetic network programming with parallel processing for association rule mining in large and dense databases. Proceedings of GECCO 2007: Genetic and Evolutionary Computation Conference. 2007. pp. 1512
@inproceedings{cd1edb7fe31c4375b3c3e93512fb2c4e,
title = "Genetic network programming with parallel processing for association rule mining in large and dense databases",
abstract = "Several methods of extracting association rules have been reported. A new evolutionary computation method named Genetic Network Programming (GNP) has also been developed recently and its efectiveness is shown for small datasets. However, it has not been tested for large datasets, particularly in datasets with a large number of attributes. The aim of this paper is to extract association rules from large and dense datasets using GNP considering a real world database with a huge number of attributes. We propose a new method where a large database is divided into many small datasets, then each GNP deals with one dataset having attributes with appropiate size, which was selected randomly from a large dataset and generated genetically. These GNPs are processed in parallel. We then propose some new genetic operations to improve the number of rules extracted and their quality as well. The proposed method improves remarkably on simulations. Fig. 1 shows the architecture of the proposed method. We use the CLIENT/SERVER model. CLIENT side carries out preprocessing of large database, assignment of files to each server, rule checking, and genetic operations on files. SERVER side carries out processing of each file using conventional GNP based mining method independently. The features and advantages of the proposed method are the following: Rule extraction is done in parallel. Each file generates its local pool of the rules. Files or datasets are treated as individuals in order to do new genetic operations over them and improve the rule extraction. Extracted rules are stored in a global pool. The rules are verified to avoid redundancy among them and it is assured that only new rules are stored.",
keywords = "Association rules, Genetic network programming, Parallel processing",
author = "Eloy Gonzales and Kaoru Shimada and Shingo Mabu and Kotaro Hirasawa and Takayuki Furuzuki",
year = "2007",
doi = "10.1145/1276958.1277241",
language = "English",
isbn = "1595936971",
pages = "1512",
booktitle = "Proceedings of GECCO 2007: Genetic and Evolutionary Computation Conference",

}

TY - GEN

T1 - Genetic network programming with parallel processing for association rule mining in large and dense databases

AU - Gonzales, Eloy

AU - Shimada, Kaoru

AU - Mabu, Shingo

AU - Hirasawa, Kotaro

AU - Furuzuki, Takayuki

PY - 2007

Y1 - 2007

N2 - Several methods of extracting association rules have been reported. A new evolutionary computation method named Genetic Network Programming (GNP) has also been developed recently and its efectiveness is shown for small datasets. However, it has not been tested for large datasets, particularly in datasets with a large number of attributes. The aim of this paper is to extract association rules from large and dense datasets using GNP considering a real world database with a huge number of attributes. We propose a new method where a large database is divided into many small datasets, then each GNP deals with one dataset having attributes with appropiate size, which was selected randomly from a large dataset and generated genetically. These GNPs are processed in parallel. We then propose some new genetic operations to improve the number of rules extracted and their quality as well. The proposed method improves remarkably on simulations. Fig. 1 shows the architecture of the proposed method. We use the CLIENT/SERVER model. CLIENT side carries out preprocessing of large database, assignment of files to each server, rule checking, and genetic operations on files. SERVER side carries out processing of each file using conventional GNP based mining method independently. The features and advantages of the proposed method are the following: Rule extraction is done in parallel. Each file generates its local pool of the rules. Files or datasets are treated as individuals in order to do new genetic operations over them and improve the rule extraction. Extracted rules are stored in a global pool. The rules are verified to avoid redundancy among them and it is assured that only new rules are stored.

AB - Several methods of extracting association rules have been reported. A new evolutionary computation method named Genetic Network Programming (GNP) has also been developed recently and its efectiveness is shown for small datasets. However, it has not been tested for large datasets, particularly in datasets with a large number of attributes. The aim of this paper is to extract association rules from large and dense datasets using GNP considering a real world database with a huge number of attributes. We propose a new method where a large database is divided into many small datasets, then each GNP deals with one dataset having attributes with appropiate size, which was selected randomly from a large dataset and generated genetically. These GNPs are processed in parallel. We then propose some new genetic operations to improve the number of rules extracted and their quality as well. The proposed method improves remarkably on simulations. Fig. 1 shows the architecture of the proposed method. We use the CLIENT/SERVER model. CLIENT side carries out preprocessing of large database, assignment of files to each server, rule checking, and genetic operations on files. SERVER side carries out processing of each file using conventional GNP based mining method independently. The features and advantages of the proposed method are the following: Rule extraction is done in parallel. Each file generates its local pool of the rules. Files or datasets are treated as individuals in order to do new genetic operations over them and improve the rule extraction. Extracted rules are stored in a global pool. The rules are verified to avoid redundancy among them and it is assured that only new rules are stored.

KW - Association rules

KW - Genetic network programming

KW - Parallel processing

UR - http://www.scopus.com/inward/record.url?scp=34548063771&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548063771&partnerID=8YFLogxK

U2 - 10.1145/1276958.1277241

DO - 10.1145/1276958.1277241

M3 - Conference contribution

SN - 1595936971

SN - 9781595936974

SP - 1512

BT - Proceedings of GECCO 2007: Genetic and Evolutionary Computation Conference

ER -