TY - GEN
T1 - Using ASR methods for OCR
AU - Arora, Ashish
AU - Garcia, Paola
AU - Watanabe, Shinji
AU - Manohar, Vimal
AU - Shao, Yiwen
AU - Khudanpur, Sanjeev
AU - Chang, Chun Chieh
AU - Rekabdar, Babak
AU - Babaali, Bagher
AU - Povey, Daniel
AU - Etter, David
AU - Raj, Desh
AU - Hadian, Hossein
AU - Trmal, Jan
N1 - Funding Information:
The Authors would like to thank Joan Puigcerver, Paul Voigtlaender and Stephens Rawls for providing additional details for their work. This work was supported by NSF CRI Award No 1513128. A part of this research was conducted under the auspices of the SCALE 2018 workshop at the Johns Hopkins University HLTCOE.
Funding Information:
VI. ACKNOWLEDGMENT The Authors would like to thank Joan Puigcerver, Paul Voigtlaender and Stephens Rawls for providing additional details for their work. This work was supported by NSF CRI Award No 1513128. A part of this research was conducted under the auspices of the SCALE 2018 workshop at the Johns Hopkins University HLTCOE.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Hybrid deep neural network hidden Markov models (DNN-HMM) have achieved impressive results on large vocabulary continuous speech recognition (LVCSR) tasks. However, the recent approaches using DNN-HMM models are not explored much for text recognition. Inspired by the current work in automatic speech recognition (ASR) and machine translation, we present an open vocabulary sub-word text recognition system. The sub-word lexicon and sub-word language model (LM) helps in overcoming the challenge of recognizing out of vocabulary (OOV) words, and a time delay neural network (TDNN) and convolution neural network (CNN) based DNN-HMM optical model (OM) efficiently models the sequence dependency in the line image. We present results on 12 datasets with training data varying from 6k lines to 600k lines. The system is built for 8 languages, i.e., English, French, Arabic, Chinese, Farsi, Tamil, Russian, and Korean. We report competitive results on several commonly used handwritten and printed text datasets.
AB - Hybrid deep neural network hidden Markov models (DNN-HMM) have achieved impressive results on large vocabulary continuous speech recognition (LVCSR) tasks. However, the recent approaches using DNN-HMM models are not explored much for text recognition. Inspired by the current work in automatic speech recognition (ASR) and machine translation, we present an open vocabulary sub-word text recognition system. The sub-word lexicon and sub-word language model (LM) helps in overcoming the challenge of recognizing out of vocabulary (OOV) words, and a time delay neural network (TDNN) and convolution neural network (CNN) based DNN-HMM optical model (OM) efficiently models the sequence dependency in the line image. We present results on 12 datasets with training data varying from 6k lines to 600k lines. The system is built for 8 languages, i.e., English, French, Arabic, Chinese, Farsi, Tamil, Russian, and Korean. We report competitive results on several commonly used handwritten and printed text datasets.
KW - ASR
KW - BPE
KW - LF MMI
KW - OCR
KW - Open Vocabulary
UR - http://www.scopus.com/inward/record.url?scp=85079855287&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079855287&partnerID=8YFLogxK
U2 - 10.1109/ICDAR.2019.00111
DO - 10.1109/ICDAR.2019.00111
M3 - Conference contribution
AN - SCOPUS:85079855287
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 663
EP - 668
BT - Proceedings - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019
PB - IEEE Computer Society
T2 - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019
Y2 - 20 September 2019 through 25 September 2019
ER -