A framework for cross-language information access

Application to English and Japanese

Gareth Jones, Nigel Collier, Tetsuya Sakai, Kazuo Sumita, Hideki Hirakawa

Research output: Contribution to journalArticle

Abstract

Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documents that they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This paper shows how information retrieval and machine translation can be combined in a cross-language information access framework to help overcome the language barrier. We present encouraging preliminary experimental results using English queries to retrieve documents from the standard Japanese language BMIR-J2 retrieval test collection. We outline the scope and purpose of cross-language information access and provide an example application to suggest that technology already exists to provide effective and potentially useful applications.

Original languageEnglish
Pages (from-to)371-388
Number of pages18
JournalComputers and the Humanities
Volume35
Issue number4
Publication statusPublished - 2001 Nov
Externally publishedYes

Fingerprint

language
Internet
language barrier
information retrieval
search engine
Cross-language
Language
World Wide Web
Information Retrieval
Japanese Language
Search Engine
Machine Translation

Keywords

  • Cross-language information retrieval
  • Information access
  • Japanese-English
  • Machine translation
  • Probabilistic retrieval

ASJC Scopus subject areas

  • Social Sciences(all)

Cite this

A framework for cross-language information access : Application to English and Japanese. / Jones, Gareth; Collier, Nigel; Sakai, Tetsuya; Sumita, Kazuo; Hirakawa, Hideki.

In: Computers and the Humanities, Vol. 35, No. 4, 11.2001, p. 371-388.

Research output: Contribution to journalArticle

Jones, G, Collier, N, Sakai, T, Sumita, K & Hirakawa, H 2001, 'A framework for cross-language information access: Application to English and Japanese', Computers and the Humanities, vol. 35, no. 4, pp. 371-388.
Jones, Gareth ; Collier, Nigel ; Sakai, Tetsuya ; Sumita, Kazuo ; Hirakawa, Hideki. / A framework for cross-language information access : Application to English and Japanese. In: Computers and the Humanities. 2001 ; Vol. 35, No. 4. pp. 371-388.
@article{b12ea7b4714046beb7accf06df8d8cac,
title = "A framework for cross-language information access: Application to English and Japanese",
abstract = "Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documents that they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This paper shows how information retrieval and machine translation can be combined in a cross-language information access framework to help overcome the language barrier. We present encouraging preliminary experimental results using English queries to retrieve documents from the standard Japanese language BMIR-J2 retrieval test collection. We outline the scope and purpose of cross-language information access and provide an example application to suggest that technology already exists to provide effective and potentially useful applications.",
keywords = "Cross-language information retrieval, Information access, Japanese-English, Machine translation, Probabilistic retrieval",
author = "Gareth Jones and Nigel Collier and Tetsuya Sakai and Kazuo Sumita and Hideki Hirakawa",
year = "2001",
month = "11",
language = "English",
volume = "35",
pages = "371--388",
journal = "Language Resources and Evaluation",
issn = "1574-020X",
publisher = "Springer Netherlands",
number = "4",

}

TY - JOUR

T1 - A framework for cross-language information access

T2 - Application to English and Japanese

AU - Jones, Gareth

AU - Collier, Nigel

AU - Sakai, Tetsuya

AU - Sumita, Kazuo

AU - Hirakawa, Hideki

PY - 2001/11

Y1 - 2001/11

N2 - Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documents that they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This paper shows how information retrieval and machine translation can be combined in a cross-language information access framework to help overcome the language barrier. We present encouraging preliminary experimental results using English queries to retrieve documents from the standard Japanese language BMIR-J2 retrieval test collection. We outline the scope and purpose of cross-language information access and provide an example application to suggest that technology already exists to provide effective and potentially useful applications.

AB - Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documents that they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This paper shows how information retrieval and machine translation can be combined in a cross-language information access framework to help overcome the language barrier. We present encouraging preliminary experimental results using English queries to retrieve documents from the standard Japanese language BMIR-J2 retrieval test collection. We outline the scope and purpose of cross-language information access and provide an example application to suggest that technology already exists to provide effective and potentially useful applications.

KW - Cross-language information retrieval

KW - Information access

KW - Japanese-English

KW - Machine translation

KW - Probabilistic retrieval

UR - http://www.scopus.com/inward/record.url?scp=33750642500&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750642500&partnerID=8YFLogxK

M3 - Article

VL - 35

SP - 371

EP - 388

JO - Language Resources and Evaluation

JF - Language Resources and Evaluation

SN - 1574-020X

IS - 4

ER -