A large-scale Web data collection as a natural language processing infrastructure

Keiji Shinzato, Daisuke Kawahara, Chikara Hashimoto, Sadao Kurohashi

研究成果: Conference contribution

6 被引用数 (Scopus)

抄録

In recent years, language resources acquired from the Web are released, and these data improve the performance of applications in several NLP tasks. Although the language resources based on the web page unit are useful in NLP tasks and applications such as knowledge acquisition, document retrieval and document summarization, such language resources are not released so far. In this paper, we propose a data format for results of web page processing, and a search engine infrastructure which makes it possible to share approximately 100 million Japanese web data. By obtaining the web data, NLP researchers are enabled to begin their own processing immediately without analyzing web pages by themselves.

本文言語English
ホスト出版物のタイトルProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
出版社European Language Resources Association (ELRA)
ページ2236-2241
ページ数6
ISBN(電子版)2951740840, 9782951740846
出版ステータスPublished - 2008
外部発表はい
イベント6th International Conference on Language Resources and Evaluation, LREC 2008 - Marrakech, Morocco
継続期間: 2008 5 282008 5 30

出版物シリーズ

名前Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008

Other

Other6th International Conference on Language Resources and Evaluation, LREC 2008
国/地域Morocco
CityMarrakech
Period08/5/2808/5/30

ASJC Scopus subject areas

  • 図書館情報学
  • 言語学および言語
  • 言語および言語学
  • 教育

フィンガープリント

「A large-scale Web data collection as a natural language processing infrastructure」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル