A Machine Learning Approach to Knowledge Acquisitions from Text Databases

Yasubumi Sakakibara, Kazuo Misue, Takeshi Koshiba

研究成果: Article

抄録

The rapid growth of data in large databases, such as text databases and scientific databases, requires efficient computer methods for automating analyses of the data with the goal of acquiring knowledges or making discoveries. Because the analyses of data are generally so expensive, most parts in databases remains as raw, unanalyzed primary data. Technology from machine learning (ML) will offer efficient tools for the intelligent analyses of the data using generalization ability. Generalization is an important ability specific to inductive learning that will predict unseen data with high accuracy based on learned concepts from training examples. In this article, we apply ML to text-database analyses and knowledge acquisitions from text databases. We propose a completely new approach to the problem of text classification and extracting keywords by using ML techniques. We introduce a class of representations for classifying text data based on decision trees; (i.e., decision trees over attributes on strings) and present an algorithm for learning them inductively. Our algorithm has the following features: It does not need any natural language processing technique and it is robust for noisy data. We show that our learning algorithm can be used for automatic extraction of keywords for text retrieval and automatic text categorization. We also demonstrate some experimental results using our algorithm on the problem of classifying bibliographic data and extracting keywords in order to show the effectiveness of our approach.

元の言語English
ページ(範囲)309-324
ページ数16
ジャーナルPlastics, Rubber and Composites Processing and Applications
8
発行部数3
出版物ステータスPublished - 1996
外部発表Yes

Fingerprint

Knowledge acquisition
Learning systems
Decision trees
Learning algorithms
Processing

ASJC Scopus subject areas

  • Engineering(all)

これを引用

A Machine Learning Approach to Knowledge Acquisitions from Text Databases. / Sakakibara, Yasubumi; Misue, Kazuo; Koshiba, Takeshi.

:: Plastics, Rubber and Composites Processing and Applications, 巻 8, 番号 3, 1996, p. 309-324.

研究成果: Article

@article{ea12abff1d3c43f3b5cc302340feef1f,
title = "A Machine Learning Approach to Knowledge Acquisitions from Text Databases",
abstract = "The rapid growth of data in large databases, such as text databases and scientific databases, requires efficient computer methods for automating analyses of the data with the goal of acquiring knowledges or making discoveries. Because the analyses of data are generally so expensive, most parts in databases remains as raw, unanalyzed primary data. Technology from machine learning (ML) will offer efficient tools for the intelligent analyses of the data using generalization ability. Generalization is an important ability specific to inductive learning that will predict unseen data with high accuracy based on learned concepts from training examples. In this article, we apply ML to text-database analyses and knowledge acquisitions from text databases. We propose a completely new approach to the problem of text classification and extracting keywords by using ML techniques. We introduce a class of representations for classifying text data based on decision trees; (i.e., decision trees over attributes on strings) and present an algorithm for learning them inductively. Our algorithm has the following features: It does not need any natural language processing technique and it is robust for noisy data. We show that our learning algorithm can be used for automatic extraction of keywords for text retrieval and automatic text categorization. We also demonstrate some experimental results using our algorithm on the problem of classifying bibliographic data and extracting keywords in order to show the effectiveness of our approach.",
author = "Yasubumi Sakakibara and Kazuo Misue and Takeshi Koshiba",
year = "1996",
language = "English",
volume = "8",
pages = "309--324",
journal = "Plastics, Rubber and Composites",
issn = "1465-8011",
publisher = "Maney Publishing",
number = "3",

}

TY - JOUR

T1 - A Machine Learning Approach to Knowledge Acquisitions from Text Databases

AU - Sakakibara, Yasubumi

AU - Misue, Kazuo

AU - Koshiba, Takeshi

PY - 1996

Y1 - 1996

N2 - The rapid growth of data in large databases, such as text databases and scientific databases, requires efficient computer methods for automating analyses of the data with the goal of acquiring knowledges or making discoveries. Because the analyses of data are generally so expensive, most parts in databases remains as raw, unanalyzed primary data. Technology from machine learning (ML) will offer efficient tools for the intelligent analyses of the data using generalization ability. Generalization is an important ability specific to inductive learning that will predict unseen data with high accuracy based on learned concepts from training examples. In this article, we apply ML to text-database analyses and knowledge acquisitions from text databases. We propose a completely new approach to the problem of text classification and extracting keywords by using ML techniques. We introduce a class of representations for classifying text data based on decision trees; (i.e., decision trees over attributes on strings) and present an algorithm for learning them inductively. Our algorithm has the following features: It does not need any natural language processing technique and it is robust for noisy data. We show that our learning algorithm can be used for automatic extraction of keywords for text retrieval and automatic text categorization. We also demonstrate some experimental results using our algorithm on the problem of classifying bibliographic data and extracting keywords in order to show the effectiveness of our approach.

AB - The rapid growth of data in large databases, such as text databases and scientific databases, requires efficient computer methods for automating analyses of the data with the goal of acquiring knowledges or making discoveries. Because the analyses of data are generally so expensive, most parts in databases remains as raw, unanalyzed primary data. Technology from machine learning (ML) will offer efficient tools for the intelligent analyses of the data using generalization ability. Generalization is an important ability specific to inductive learning that will predict unseen data with high accuracy based on learned concepts from training examples. In this article, we apply ML to text-database analyses and knowledge acquisitions from text databases. We propose a completely new approach to the problem of text classification and extracting keywords by using ML techniques. We introduce a class of representations for classifying text data based on decision trees; (i.e., decision trees over attributes on strings) and present an algorithm for learning them inductively. Our algorithm has the following features: It does not need any natural language processing technique and it is robust for noisy data. We show that our learning algorithm can be used for automatic extraction of keywords for text retrieval and automatic text categorization. We also demonstrate some experimental results using our algorithm on the problem of classifying bibliographic data and extracting keywords in order to show the effectiveness of our approach.

UR - http://www.scopus.com/inward/record.url?scp=0642343320&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0642343320&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0642343320

VL - 8

SP - 309

EP - 324

JO - Plastics, Rubber and Composites

JF - Plastics, Rubber and Composites

SN - 1465-8011

IS - 3

ER -