Data augmentation for ancient characters via blend-font net

Xiaolu Ren, Sei Ichiro Kamata

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Historical documents record a lot of precious information through ancient characters. However, some problems like unbalanced character samples and intra-class multi-modality inside the documents are critical factors that limit the performance of existing character recognition technologies. Therefore, we propose a two-stage font generation model, Blend-Font Net, which use some easy to get modern character datasets to augment ancient character dataset and solve these mentioned problems based on blend-font strategy. The model generates new samples by extracting and modifying the font information from the character image. A font generation model learns the mapping between different fonts in the first stage, and the slightly modified model learns how to generate samples that blend two different fonts in the second stage. Extra samples are generated for balancing historical documents dataset through the proposed model. Experiments show that our results have diverse visual effects and improve the accuracy of the text recognition network. Furthermore, the proposed method shows a broad application prospect in similar works as no font label required and multi-modality problem solved.

Original languageEnglish
Title of host publicationThirteenth International Conference on Digital Image Processing, ICDIP 2021
EditorsXudong Jiang, Hiroshi Fujita
PublisherSPIE
ISBN (Electronic)9781510646001
DOIs
Publication statusPublished - 2021
Event13th International Conference on Digital Image Processing, ICDIP 2021 - Singapore, Singapore
Duration: 2021 May 202021 May 23

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume11878
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

Conference13th International Conference on Digital Image Processing, ICDIP 2021
Country/TerritorySingapore
CitySingapore
Period21/5/2021/5/23

Keywords

  • Data Augmentation
  • Font generation
  • GAN
  • Style Transfer
  • Unsupervised Learning

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Data augmentation for ancient characters via blend-font net'. Together they form a unique fingerprint.

Cite this