Automated web data mining using semantic analysis

Wenxiang Dou, Jinglu Hu

研究成果: Conference contribution

2 引用 (Scopus)

抜粋

This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.

元の言語English
ホスト出版物のタイトルAdvanced Data Mining and Applications - 8th International Conference, ADMA 2012, Proceedings
ページ539-551
ページ数13
DOI
出版物ステータスPublished - 2012 12 1
外部発表Yes
イベント8th International Conference on Advanced Data Mining and Applications, ADMA 2012 - Nanjing, China
継続期間: 2012 12 152012 12 18

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
7713 LNAI
ISSN(印刷物)0302-9743
ISSN(電子版)1611-3349

Conference

Conference8th International Conference on Advanced Data Mining and Applications, ADMA 2012
China
Nanjing
期間12/12/1512/12/18

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

フィンガープリント Automated web data mining using semantic analysis' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Dou, W., & Hu, J. (2012). Automated web data mining using semantic analysis. : Advanced Data Mining and Applications - 8th International Conference, ADMA 2012, Proceedings (pp. 539-551). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 巻数 7713 LNAI). https://doi.org/10.1007/978-3-642-35527-1_45