Automated web data mining using semantic analysis

Wenxiang Dou, Takayuki Furuzuki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages539-551
Number of pages13
Volume7713 LNAI
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event8th International Conference on Advanced Data Mining and Applications, ADMA 2012 - Nanjing
Duration: 2012 Dec 152012 Dec 18

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7713 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other8th International Conference on Advanced Data Mining and Applications, ADMA 2012
CityNanjing
Period12/12/1512/12/18

Fingerprint

Web Mining
Semantic Analysis
Data mining
Websites
Data Mining
Semantics
World Wide Web
Tree Structure
Vertex of a graph
Template
Leaves
Experiments
Experiment

Keywords

  • Product data mining
  • Web data extraction
  • Web mining

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Dou, W., & Furuzuki, T. (2012). Automated web data mining using semantic analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7713 LNAI, pp. 539-551). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7713 LNAI). https://doi.org/10.1007/978-3-642-35527-1_45

Automated web data mining using semantic analysis. / Dou, Wenxiang; Furuzuki, Takayuki.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7713 LNAI 2012. p. 539-551 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7713 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dou, W & Furuzuki, T 2012, Automated web data mining using semantic analysis. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 7713 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7713 LNAI, pp. 539-551, 8th International Conference on Advanced Data Mining and Applications, ADMA 2012, Nanjing, 12/12/15. https://doi.org/10.1007/978-3-642-35527-1_45
Dou W, Furuzuki T. Automated web data mining using semantic analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7713 LNAI. 2012. p. 539-551. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-35527-1_45
Dou, Wenxiang ; Furuzuki, Takayuki. / Automated web data mining using semantic analysis. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7713 LNAI 2012. pp. 539-551 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{aee2ab3e89fb4461b2ca424d6fd86770,
title = "Automated web data mining using semantic analysis",
abstract = "This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.",
keywords = "Product data mining, Web data extraction, Web mining",
author = "Wenxiang Dou and Takayuki Furuzuki",
year = "2012",
doi = "10.1007/978-3-642-35527-1_45",
language = "English",
isbn = "9783642355264",
volume = "7713 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "539--551",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Automated web data mining using semantic analysis

AU - Dou, Wenxiang

AU - Furuzuki, Takayuki

PY - 2012

Y1 - 2012

N2 - This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.

AB - This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.

KW - Product data mining

KW - Web data extraction

KW - Web mining

UR - http://www.scopus.com/inward/record.url?scp=84872694899&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84872694899&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-35527-1_45

DO - 10.1007/978-3-642-35527-1_45

M3 - Conference contribution

AN - SCOPUS:84872694899

SN - 9783642355264

VL - 7713 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 539

EP - 551

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -