Integrating multiple internet directories by instance-based learning

Ryutaro Ichise, Hiedeaki Takeda, Shinichi Honiden

Research output: Contribution to journalConference article

36 Citations (Scopus)

Abstract

Finding desired information on the Internet is becoming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hierarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the collection and categorization of pages. We propose a method for integrating multiple Internet directories by instance-based learning. Our method provides the mapping of categories in order to transfer documents from one directory to another, instead of simply merging two directories into one. We present herein an effective algorithm for determining similar categories between two directories via a statistical method called the k-statistic. In order to evaluate the proposed method, we conducted experiments using two actual Internet directories, Yahoo! and Google. The results show that the proposed method achieves extensive improvements relative to both the Naive Bayes and Enhanced Naive Bayes approaches, without any text analysis on documents.

Original languageEnglish
Pages (from-to)22-28
Number of pages7
JournalIJCAI International Joint Conference on Artificial Intelligence
Publication statusPublished - 2003 Dec 1
Externally publishedYes
Event18th International Joint Conference on Artificial Intelligence, IJCAI 2003 - Acapulco, Mexico
Duration: 2003 Aug 92003 Aug 15

Fingerprint

Internet
Merging
Websites
Statistical methods
Statistics
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Integrating multiple internet directories by instance-based learning. / Ichise, Ryutaro; Takeda, Hiedeaki; Honiden, Shinichi.

In: IJCAI International Joint Conference on Artificial Intelligence, 01.12.2003, p. 22-28.

Research output: Contribution to journalConference article

@article{4468849d2c43477c80b0c0f27cae52b1,
title = "Integrating multiple internet directories by instance-based learning",
abstract = "Finding desired information on the Internet is becoming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hierarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the collection and categorization of pages. We propose a method for integrating multiple Internet directories by instance-based learning. Our method provides the mapping of categories in order to transfer documents from one directory to another, instead of simply merging two directories into one. We present herein an effective algorithm for determining similar categories between two directories via a statistical method called the k-statistic. In order to evaluate the proposed method, we conducted experiments using two actual Internet directories, Yahoo! and Google. The results show that the proposed method achieves extensive improvements relative to both the Naive Bayes and Enhanced Naive Bayes approaches, without any text analysis on documents.",
author = "Ryutaro Ichise and Hiedeaki Takeda and Shinichi Honiden",
year = "2003",
month = "12",
day = "1",
language = "English",
pages = "22--28",
journal = "IJCAI International Joint Conference on Artificial Intelligence",
issn = "1045-0823",

}

TY - JOUR

T1 - Integrating multiple internet directories by instance-based learning

AU - Ichise, Ryutaro

AU - Takeda, Hiedeaki

AU - Honiden, Shinichi

PY - 2003/12/1

Y1 - 2003/12/1

N2 - Finding desired information on the Internet is becoming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hierarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the collection and categorization of pages. We propose a method for integrating multiple Internet directories by instance-based learning. Our method provides the mapping of categories in order to transfer documents from one directory to another, instead of simply merging two directories into one. We present herein an effective algorithm for determining similar categories between two directories via a statistical method called the k-statistic. In order to evaluate the proposed method, we conducted experiments using two actual Internet directories, Yahoo! and Google. The results show that the proposed method achieves extensive improvements relative to both the Naive Bayes and Enhanced Naive Bayes approaches, without any text analysis on documents.

AB - Finding desired information on the Internet is becoming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hierarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the collection and categorization of pages. We propose a method for integrating multiple Internet directories by instance-based learning. Our method provides the mapping of categories in order to transfer documents from one directory to another, instead of simply merging two directories into one. We present herein an effective algorithm for determining similar categories between two directories via a statistical method called the k-statistic. In order to evaluate the proposed method, we conducted experiments using two actual Internet directories, Yahoo! and Google. The results show that the proposed method achieves extensive improvements relative to both the Naive Bayes and Enhanced Naive Bayes approaches, without any text analysis on documents.

UR - http://www.scopus.com/inward/record.url?scp=84880768438&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880768438&partnerID=8YFLogxK

M3 - Conference article

SP - 22

EP - 28

JO - IJCAI International Joint Conference on Artificial Intelligence

JF - IJCAI International Joint Conference on Artificial Intelligence

SN - 1045-0823

ER -