ViT-GAN: Using Vision Transformer as Discriminator with Adaptive Data Augmentation

Shota Hirose, Naoki Wada, Jiro Katto, Heming Sun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

These days, attention is thought to be an efficient way to recognize an image. Vision Transformer (ViT) uses a Transformer for images and has very high performance in image recognition. ViT has fewer parameters than Big Transfer (BiT) and Noisy Student. Therefore, we consider that Self-Attention-based networks are slimmer than convolution-based networks. We use a ViT as a Discriminator in a Generative Adversarial Network (GAN) to get the same performance with a smaller model. We name it ViT-GAN. Besides, we find parameter sharing is very useful to make parameter-efficient ViT. However, the performances of ViT heavily depend on the number of data samples. Therefore, we propose a new method of Data Augmentation. Our Data Augmentation, in which the strength of Data Augmentation varies adaptively, helps ViT for faster convergence and better performance. With our Data Augmentation, we show ViT-based discriminator can achieve almost the same FID but the number of the parameters of the discriminator is 35% fewer than the original discriminator.

Original languageEnglish
Title of host publication2021 3rd International Conference on Computer Communication and the Internet, ICCCI 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages185-189
Number of pages5
ISBN (Electronic)9781728176185
DOIs
Publication statusPublished - 2021 Jun 25
Event3rd International Conference on Computer Communication and the Internet, ICCCI 2021 - Virtual, Nagoya, Japan
Duration: 2021 Jun 252021 Jun 27

Publication series

Name2021 3rd International Conference on Computer Communication and the Internet, ICCCI 2021

Conference

Conference3rd International Conference on Computer Communication and the Internet, ICCCI 2021
Country/TerritoryJapan
CityVirtual, Nagoya
Period21/6/2521/6/27

Keywords

  • Data Augmentation
  • Generative Adversarial Network
  • Vision Transformer

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'ViT-GAN: Using Vision Transformer as Discriminator with Adaptive Data Augmentation'. Together they form a unique fingerprint.

Cite this