Effective text extraction and recognition for WWW images

Jun Sun*, Zhulong Wang, Hao Yu, Fumihito Nishino, Yukata Katsuyama, Satoshi Naoi

*この研究の対応する著者

研究成果: Conference contribution

11 被引用数 (Scopus)

抄録

Images play a very important role in web content delivery. Many WWW images contain text information that can be used for web indexing and searching. A new text extraction and recognition algorithm is proposed in this paper. The character strokes in the image are first extracted by color clustering and connected component analysis. A novel stroke verification algorithm is used to effectively remove non-character strokes. The verified strokes are then used to build the binary text line image, which is segmented and recognized by dynamic programming. Since text in WWW image usually has close relationship with webpage content, approximate string matching is used to revise the recognition result by matching the content in the webpage with the content in the image. This effective post-processing not only improves the recognition performance, but also can be used in other applications such like image - webpage paragraph corresponding.

本文言語English
ホスト出版物のタイトルProceedings of the 2003 ACM Symposium on Document Engineering
出版社Association for Computing Machinery (ACM)
ページ115-117
ページ数3
ISBN(印刷版)1581137249, 9781581137248
DOI
出版ステータスPublished - 2003
外部発表はい
イベントProceedings of the 2003 ACM Symposium on Document Engineering - Grenoble, France
継続期間: 2003 11 202003 11 22

出版物シリーズ

名前Proceedings of the 2003 ACM Symposium on Document Engineering

Conference

ConferenceProceedings of the 2003 ACM Symposium on Document Engineering
国/地域France
CityGrenoble
Period03/11/2003/11/22

ASJC Scopus subject areas

  • 工学(全般)

フィンガープリント

「Effective text extraction and recognition for WWW images」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル