Document layout analysis and reading order determination for a reading robot

Yucun Pan, Qunfei Zhao, Seiichiro Kamata

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

In this paper an efficient approach of document layout analysis and reading order determination is proposed for a reading robot. Firstly the input document images are preprocessed to remove noises, connect lines and domains, and to reduce the computation time. Secondly a bottom-up, parameter-independent, two-step layout analysis algorithm based on morphology is used, which outlines the geometry of the maximum homogeneous regions and classifies them into texts, tables, and pictures. Finally the reading order is determined, by a top-down recursive hierarchy algorithm derived from XY-cut, using a set of rules depending on layout information. Important parameters are acquired using statistic information of the given images to adapt to different types of documents. The proposed algorithm is applied to a large number of document images and the experimental results show that it makes the reading robot be able to read paper documents of different languages, even with complex layout structure.

Original languageEnglish
Title of host publicationIEEE Region 10 Annual International Conference, Proceedings/TENCON
Pages1607-1612
Number of pages6
DOIs
Publication statusPublished - 2010
Event2010 IEEE Region 10 Conference, TENCON 2010 - Fukuoka
Duration: 2010 Nov 212010 Nov 24

Other

Other2010 IEEE Region 10 Conference, TENCON 2010
CityFukuoka
Period10/11/2110/11/24

Fingerprint

Robots
Statistics
Geometry

Keywords

  • A reading robot
  • Adaptive
  • Hierarchy
  • Layout analysis
  • Morphology based
  • Reading order determination

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications

Cite this

Pan, Y., Zhao, Q., & Kamata, S. (2010). Document layout analysis and reading order determination for a reading robot. In IEEE Region 10 Annual International Conference, Proceedings/TENCON (pp. 1607-1612). [5686038] https://doi.org/10.1109/TENCON.2010.5686038

Document layout analysis and reading order determination for a reading robot. / Pan, Yucun; Zhao, Qunfei; Kamata, Seiichiro.

IEEE Region 10 Annual International Conference, Proceedings/TENCON. 2010. p. 1607-1612 5686038.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pan, Y, Zhao, Q & Kamata, S 2010, Document layout analysis and reading order determination for a reading robot. in IEEE Region 10 Annual International Conference, Proceedings/TENCON., 5686038, pp. 1607-1612, 2010 IEEE Region 10 Conference, TENCON 2010, Fukuoka, 10/11/21. https://doi.org/10.1109/TENCON.2010.5686038
Pan Y, Zhao Q, Kamata S. Document layout analysis and reading order determination for a reading robot. In IEEE Region 10 Annual International Conference, Proceedings/TENCON. 2010. p. 1607-1612. 5686038 https://doi.org/10.1109/TENCON.2010.5686038
Pan, Yucun ; Zhao, Qunfei ; Kamata, Seiichiro. / Document layout analysis and reading order determination for a reading robot. IEEE Region 10 Annual International Conference, Proceedings/TENCON. 2010. pp. 1607-1612
@inproceedings{5437c22337c345abbaf872f3449690c5,
title = "Document layout analysis and reading order determination for a reading robot",
abstract = "In this paper an efficient approach of document layout analysis and reading order determination is proposed for a reading robot. Firstly the input document images are preprocessed to remove noises, connect lines and domains, and to reduce the computation time. Secondly a bottom-up, parameter-independent, two-step layout analysis algorithm based on morphology is used, which outlines the geometry of the maximum homogeneous regions and classifies them into texts, tables, and pictures. Finally the reading order is determined, by a top-down recursive hierarchy algorithm derived from XY-cut, using a set of rules depending on layout information. Important parameters are acquired using statistic information of the given images to adapt to different types of documents. The proposed algorithm is applied to a large number of document images and the experimental results show that it makes the reading robot be able to read paper documents of different languages, even with complex layout structure.",
keywords = "A reading robot, Adaptive, Hierarchy, Layout analysis, Morphology based, Reading order determination",
author = "Yucun Pan and Qunfei Zhao and Seiichiro Kamata",
year = "2010",
doi = "10.1109/TENCON.2010.5686038",
language = "English",
isbn = "9781424468904",
pages = "1607--1612",
booktitle = "IEEE Region 10 Annual International Conference, Proceedings/TENCON",

}

TY - GEN

T1 - Document layout analysis and reading order determination for a reading robot

AU - Pan, Yucun

AU - Zhao, Qunfei

AU - Kamata, Seiichiro

PY - 2010

Y1 - 2010

N2 - In this paper an efficient approach of document layout analysis and reading order determination is proposed for a reading robot. Firstly the input document images are preprocessed to remove noises, connect lines and domains, and to reduce the computation time. Secondly a bottom-up, parameter-independent, two-step layout analysis algorithm based on morphology is used, which outlines the geometry of the maximum homogeneous regions and classifies them into texts, tables, and pictures. Finally the reading order is determined, by a top-down recursive hierarchy algorithm derived from XY-cut, using a set of rules depending on layout information. Important parameters are acquired using statistic information of the given images to adapt to different types of documents. The proposed algorithm is applied to a large number of document images and the experimental results show that it makes the reading robot be able to read paper documents of different languages, even with complex layout structure.

AB - In this paper an efficient approach of document layout analysis and reading order determination is proposed for a reading robot. Firstly the input document images are preprocessed to remove noises, connect lines and domains, and to reduce the computation time. Secondly a bottom-up, parameter-independent, two-step layout analysis algorithm based on morphology is used, which outlines the geometry of the maximum homogeneous regions and classifies them into texts, tables, and pictures. Finally the reading order is determined, by a top-down recursive hierarchy algorithm derived from XY-cut, using a set of rules depending on layout information. Important parameters are acquired using statistic information of the given images to adapt to different types of documents. The proposed algorithm is applied to a large number of document images and the experimental results show that it makes the reading robot be able to read paper documents of different languages, even with complex layout structure.

KW - A reading robot

KW - Adaptive

KW - Hierarchy

KW - Layout analysis

KW - Morphology based

KW - Reading order determination

UR - http://www.scopus.com/inward/record.url?scp=79951623521&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79951623521&partnerID=8YFLogxK

U2 - 10.1109/TENCON.2010.5686038

DO - 10.1109/TENCON.2010.5686038

M3 - Conference contribution

SN - 9781424468904

SP - 1607

EP - 1612

BT - IEEE Region 10 Annual International Conference, Proceedings/TENCON

ER -