TY - GEN
T1 - A-A KD
T2 - 7th IEEE International Conference on Multimedia Big Data, BigMM 2021
AU - Gou, Aorui
AU - Liu, Chao
AU - Sun, Heming
AU - Zeng, Xiaoyang
AU - Fan, Yibo
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 62031009, in part by the Shanghai Science and Technology Committee (STCSM) under Grant 19511104300, in part by Alibaba Innovative Research (AIR) Program, in part by the Innovation Program of Shanghai Municipal Education Commission, in part by the Fudan University-CIOMP Joint Fund (FC2019-001), in part by JST, PRESTO under Grant JPMJPR19M5.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - We propose a knowledge distillation method named attention and activation knowledge distillation (A-A KD) in this paper. By jointly taking advantage of the attention mechanism as an inter-channel method and activation information for intra-channel, the student model can overcome the insufficiency of feature extraction and effectively mimic features of the teacher model. A-A KD can outperform the state-of-the-art performance in various tasks such as image classification, object detection, and semantic segmentation. It can improve 1.8% of mAP on PASCAL VOC07 and 1.5% of mIoU on PASCAL VOC12 than conventional student models. Moreover, experimental results show that our student model (ResNet50) can reach 21.42% of top-1 error with A-A KD, which is better than the corresponding teacher model in ImageNet.
AB - We propose a knowledge distillation method named attention and activation knowledge distillation (A-A KD) in this paper. By jointly taking advantage of the attention mechanism as an inter-channel method and activation information for intra-channel, the student model can overcome the insufficiency of feature extraction and effectively mimic features of the teacher model. A-A KD can outperform the state-of-the-art performance in various tasks such as image classification, object detection, and semantic segmentation. It can improve 1.8% of mAP on PASCAL VOC07 and 1.5% of mIoU on PASCAL VOC12 than conventional student models. Moreover, experimental results show that our student model (ResNet50) can reach 21.42% of top-1 error with A-A KD, which is better than the corresponding teacher model in ImageNet.
KW - Image classification
KW - Knowledge distillation
KW - Object detection
KW - Semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85123989980&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123989980&partnerID=8YFLogxK
U2 - 10.1109/BigMM52142.2021.00016
DO - 10.1109/BigMM52142.2021.00016
M3 - Conference contribution
AN - SCOPUS:85123989980
T3 - Proceedings - 2021 IEEE 7th International Conference on Multimedia Big Data, BigMM 2021
SP - 57
EP - 60
BT - Proceedings - 2021 IEEE 7th International Conference on Multimedia Big Data, BigMM 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 15 November 2021 through 17 November 2021
ER -