TY - GEN
T1 - Research and examination on implementation of super-resolution models using deep learning with INT8 precision
AU - Hirose, Shota
AU - Wada, Naoki
AU - Katto, Jiro
AU - Sun, Heming
N1 - Funding Information:
ACKNOWLEDGMENT This work was supported in part by NICT, Grant Number 03801, Japan.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Fixed-point arithmetic is a technique for treating weights and intermediate values as integers in deep learning. Since deep learning models generally store each weight as a 32-bit floating-point value, storing by 8-bit integers can reduce the size of the model. In addition, memory usage can be reduced, and inference can be much faster by hardware acceleration when special hardware for int8 inference is provided. On the other hand, when inferences are carried out by fixed-point weights, accuracy of the model is reduced due to loss of dynamic range of the weights and intermediate layer values. For this reason, inference frameworks such as TensorRT and TensorFlow Lite, provide a function called "calibration"to suppress the deterioration of the accuracy caused by quantization by measuring the distribution of input data and numerical values in the intermediate layer when quantization is performed. In this paper, after quantizing a pre-trained model that performs super-resolution, speed and accuracy are measured using TensorRT. As a result, the trade-off between the runtime and the accuracy is confirmed. The effect of calibration is also confirmed.
AB - Fixed-point arithmetic is a technique for treating weights and intermediate values as integers in deep learning. Since deep learning models generally store each weight as a 32-bit floating-point value, storing by 8-bit integers can reduce the size of the model. In addition, memory usage can be reduced, and inference can be much faster by hardware acceleration when special hardware for int8 inference is provided. On the other hand, when inferences are carried out by fixed-point weights, accuracy of the model is reduced due to loss of dynamic range of the weights and intermediate layer values. For this reason, inference frameworks such as TensorRT and TensorFlow Lite, provide a function called "calibration"to suppress the deterioration of the accuracy caused by quantization by measuring the distribution of input data and numerical values in the intermediate layer when quantization is performed. In this paper, after quantizing a pre-trained model that performs super-resolution, speed and accuracy are measured using TensorRT. As a result, the trade-off between the runtime and the accuracy is confirmed. The effect of calibration is also confirmed.
KW - Quantization
KW - Real-time inference
KW - Super resolution
KW - Tensor RT
UR - http://www.scopus.com/inward/record.url?scp=85127630787&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127630787&partnerID=8YFLogxK
U2 - 10.1109/ICAIIC54071.2022.9722655
DO - 10.1109/ICAIIC54071.2022.9722655
M3 - Conference contribution
AN - SCOPUS:85127630787
T3 - 4th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2022 - Proceedings
SP - 133
EP - 137
BT - 4th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2022
Y2 - 21 February 2022 through 24 February 2022
ER -