A-A KD: Attention and Activation Knowledge Distillation

Aorui Gou*, Chao Liu, Heming Sun, Xiaoyang Zeng, Yibo Fan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a knowledge distillation method named attention and activation knowledge distillation (A-A KD) in this paper. By jointly taking advantage of the attention mechanism as an inter-channel method and activation information for intra-channel, the student model can overcome the insufficiency of feature extraction and effectively mimic features of the teacher model. A-A KD can outperform the state-of-the-art performance in various tasks such as image classification, object detection, and semantic segmentation. It can improve 1.8% of mAP on PASCAL VOC07 and 1.5% of mIoU on PASCAL VOC12 than conventional student models. Moreover, experimental results show that our student model (ResNet50) can reach 21.42% of top-1 error with A-A KD, which is better than the corresponding teacher model in ImageNet.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 7th International Conference on Multimedia Big Data, BigMM 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages57-60
Number of pages4
ISBN (Electronic)9781665434140
DOIs
Publication statusPublished - 2021
Event7th IEEE International Conference on Multimedia Big Data, BigMM 2021 - Taichung, Taiwan, Province of China
Duration: 2021 Nov 152021 Nov 17

Publication series

NameProceedings - 2021 IEEE 7th International Conference on Multimedia Big Data, BigMM 2021

Conference

Conference7th IEEE International Conference on Multimedia Big Data, BigMM 2021
Country/TerritoryTaiwan, Province of China
CityTaichung
Period21/11/1521/11/17

Keywords

  • Image classification
  • Knowledge distillation
  • Object detection
  • Semantic segmentation

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Information Systems
  • Information Systems and Management
  • Media Technology

Fingerprint

Dive into the research topics of 'A-A KD: Attention and Activation Knowledge Distillation'. Together they form a unique fingerprint.

Cite this