TY - GEN
T1 - Improving Multiple Machine Vision Tasks in the Compressed Domain
AU - Liu, Jinming
AU - Sun, Heming
AU - Katto, Jiro
N1 - Funding Information:
This paper is supported by Japan Science and Technology Agency (JST), under Grant JPMJPR19M5; Japan Society for the Promotion of Science (JSPS), under Grant 21K17770; Kenjiro Takayanagi Foundation; NICT, Grant Number 03801, Japan.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - There is a growing number of images that are analyzed by machines rather than just humans. Recently, most machine vision tasks are based on decoded images which require an image compression (encoding/decoding) framework. However, using the decoded images in the pixel-domain has two drawbacks: 1) the complexity is high for the decoder part, 2) the accuracy (e.g., mIoU, mean absolute error, and average precision) of machine vision tasks will be degraded since decoded images only aim to optimize the human perceived quality (e.g., PSNR) so that information required for machine vision tasks will be lost during the decoding process. In this paper, we improve the machine vision tasks in the compressed domain. 1) A gate module is utilized to effectively select some compressed-domain features. 2) Knowledge distillation is introduced to improve the accuracy. 3) A training strategy is explored to support multiple tasks including the image compression. The experimental results show that we can achieve better rate-accuracy/distortion and lower complexity compared with the state-of-the-art pixel-domain work that can take both machine and human vision tasks.
AB - There is a growing number of images that are analyzed by machines rather than just humans. Recently, most machine vision tasks are based on decoded images which require an image compression (encoding/decoding) framework. However, using the decoded images in the pixel-domain has two drawbacks: 1) the complexity is high for the decoder part, 2) the accuracy (e.g., mIoU, mean absolute error, and average precision) of machine vision tasks will be degraded since decoded images only aim to optimize the human perceived quality (e.g., PSNR) so that information required for machine vision tasks will be lost during the decoding process. In this paper, we improve the machine vision tasks in the compressed domain. 1) A gate module is utilized to effectively select some compressed-domain features. 2) Knowledge distillation is introduced to improve the accuracy. 3) A training strategy is explored to support multiple tasks including the image compression. The experimental results show that we can achieve better rate-accuracy/distortion and lower complexity compared with the state-of-the-art pixel-domain work that can take both machine and human vision tasks.
UR - http://www.scopus.com/inward/record.url?scp=85143631843&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143631843&partnerID=8YFLogxK
U2 - 10.1109/ICPR56361.2022.9956532
DO - 10.1109/ICPR56361.2022.9956532
M3 - Conference contribution
AN - SCOPUS:85143631843
T3 - Proceedings - International Conference on Pattern Recognition
SP - 331
EP - 337
BT - 2022 26th International Conference on Pattern Recognition, ICPR 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th International Conference on Pattern Recognition, ICPR 2022
Y2 - 21 August 2022 through 25 August 2022
ER -