Automated web data mining using semantic analysis

Wenxiang Dou, Takayuki Furuzuki

研究成果: Conference contribution

2 引用 (Scopus)

抄録

This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.

元の言語English
ホスト出版物のタイトルLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ページ539-551
ページ数13
7713 LNAI
DOI
出版物ステータスPublished - 2012
外部発表Yes
イベント8th International Conference on Advanced Data Mining and Applications, ADMA 2012 - Nanjing
継続期間: 2012 12 152012 12 18

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
7713 LNAI
ISSN(印刷物)03029743
ISSN(電子版)16113349

Other

Other8th International Conference on Advanced Data Mining and Applications, ADMA 2012
Nanjing
期間12/12/1512/12/18

Fingerprint

Web Mining
Semantic Analysis
Data mining
Websites
Data Mining
Semantics
World Wide Web
Tree Structure
Vertex of a graph
Template
Leaves
Experiments
Experiment

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

これを引用

Dou, W., & Furuzuki, T. (2012). Automated web data mining using semantic analysis. : Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (巻 7713 LNAI, pp. 539-551). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 巻数 7713 LNAI). https://doi.org/10.1007/978-3-642-35527-1_45

Automated web data mining using semantic analysis. / Dou, Wenxiang; Furuzuki, Takayuki.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 巻 7713 LNAI 2012. p. 539-551 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 巻 7713 LNAI).

研究成果: Conference contribution

Dou, W & Furuzuki, T 2012, Automated web data mining using semantic analysis. : Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 巻. 7713 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 巻. 7713 LNAI, pp. 539-551, 8th International Conference on Advanced Data Mining and Applications, ADMA 2012, Nanjing, 12/12/15. https://doi.org/10.1007/978-3-642-35527-1_45
Dou W, Furuzuki T. Automated web data mining using semantic analysis. : Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 巻 7713 LNAI. 2012. p. 539-551. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-35527-1_45
Dou, Wenxiang ; Furuzuki, Takayuki. / Automated web data mining using semantic analysis. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 巻 7713 LNAI 2012. pp. 539-551 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{aee2ab3e89fb4461b2ca424d6fd86770,
title = "Automated web data mining using semantic analysis",
abstract = "This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.",
keywords = "Product data mining, Web data extraction, Web mining",
author = "Wenxiang Dou and Takayuki Furuzuki",
year = "2012",
doi = "10.1007/978-3-642-35527-1_45",
language = "English",
isbn = "9783642355264",
volume = "7713 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "539--551",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Automated web data mining using semantic analysis

AU - Dou, Wenxiang

AU - Furuzuki, Takayuki

PY - 2012

Y1 - 2012

N2 - This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.

AB - This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.

KW - Product data mining

KW - Web data extraction

KW - Web mining

UR - http://www.scopus.com/inward/record.url?scp=84872694899&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84872694899&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-35527-1_45

DO - 10.1007/978-3-642-35527-1_45

M3 - Conference contribution

AN - SCOPUS:84872694899

SN - 9783642355264

VL - 7713 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 539

EP - 551

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -