Automated web data mining using semantic analysis

Wenxiang Dou*, Jinglu Hu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.

Original languageEnglish
Title of host publicationAdvanced Data Mining and Applications - 8th International Conference, ADMA 2012, Proceedings
Pages539-551
Number of pages13
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event8th International Conference on Advanced Data Mining and Applications, ADMA 2012 - Nanjing, China
Duration: 2012 Dec 152012 Dec 18

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7713 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th International Conference on Advanced Data Mining and Applications, ADMA 2012
Country/TerritoryChina
CityNanjing
Period12/12/1512/12/18

Keywords

  • Product data mining
  • Web data extraction
  • Web mining

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Automated web data mining using semantic analysis'. Together they form a unique fingerprint.

Cite this