Highly accurate retrieval of Japanese document images through a combination of morphological analysis and OCR

Yutaka Katsuyama*, Hiroaki Takebe, Koji Kurokawa, Takahiro Saitoh, Satoshi Naoi

*この研究の対応する著者

研究成果: Conference article査読

7 被引用数 (Scopus)

抄録

We have developed a method that allows Japanese document images to be retrieved more accurately by using OCR character candidate information and a conventional plain text search engine. In this method, the document image is first recognized by normal OCR to produce text. Keyword areas are then estimated from the normal OCR produced text through morphological analysis. A lattice of candidate-character codes is extracted from these areas, and then character strings are extracted from the lattice using a word-matching method in noun areas and a K-th DP-matching method in undefined word areas. Finally, these extracted character strings are added to the normal OCR produced text to improve document retrieval accuracy when using a conventional plain text search engine. Experimental results from searches of 49 OHP sheet images revealed that our method has a high recall rate of 98.2%, compared to 90.3% with a conventional method using only normal OCR produced text, while requiring about the same processing time as normal OCR.

本文言語English
ページ(範囲)57-67
ページ数11
ジャーナルProceedings of SPIE - The International Society for Optical Engineering
4670
DOI
出版ステータスPublished - 2002
外部発表はい
イベントDocumentation Recognition and Retrieval IX - San Jose, CA, United States
継続期間: 2002 1月 212002 1月 22

ASJC Scopus subject areas

  • 電子材料、光学材料、および磁性材料
  • 凝縮系物理学
  • コンピュータ サイエンスの応用
  • 応用数学
  • 電子工学および電気工学

フィンガープリント

「Highly accurate retrieval of Japanese document images through a combination of morphological analysis and OCR」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル