TY - GEN
T1 - Multiple Mask Enhanced Transformer for Robust Visual Tracking
AU - Wang, Ziyu
AU - Kamata, Sei Ichiro
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - In visual object tracking tasks, scenarios such as deformation and scale variation are still challenging. In this work, we proposed a new tracking architecture with transformer and multiple masks as its key components. The transformer structure models the spatial and temporal connections among frames. The transformer encoder learns the target via attention mechanism, while the decoder utilizes the information of pervious frames to better track the current frame. To make sure that transformer pays attention to the exact target area, we propose multiple masks. Multiple masks suppress the background while leaving the target area unchanged. Multiple masks consist of spatial masks and temporal masks. Spatial masks focus on the current information while temporal masks make use of the historical information. Multiple masks further enhance the transformer, making it more focused on the target and more robust under extreme scenarios. With the transformer and multiple masks, our proposed tracker achieves the state-of-the-art level performance.
AB - In visual object tracking tasks, scenarios such as deformation and scale variation are still challenging. In this work, we proposed a new tracking architecture with transformer and multiple masks as its key components. The transformer structure models the spatial and temporal connections among frames. The transformer encoder learns the target via attention mechanism, while the decoder utilizes the information of pervious frames to better track the current frame. To make sure that transformer pays attention to the exact target area, we propose multiple masks. Multiple masks suppress the background while leaving the target area unchanged. Multiple masks consist of spatial masks and temporal masks. Spatial masks focus on the current information while temporal masks make use of the historical information. Multiple masks further enhance the transformer, making it more focused on the target and more robust under extreme scenarios. With the transformer and multiple masks, our proposed tracker achieves the state-of-the-art level performance.
KW - multiple masks
KW - object tracking
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85143643867&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143643867&partnerID=8YFLogxK
U2 - 10.1109/ICRCV55858.2022.9953264
DO - 10.1109/ICRCV55858.2022.9953264
M3 - Conference contribution
AN - SCOPUS:85143643867
T3 - 2022 4th International Conference on Robotics and Computer Vision, ICRCV 2022
SP - 43
EP - 48
BT - 2022 4th International Conference on Robotics and Computer Vision, ICRCV 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th International Conference on Robotics and Computer Vision, ICRCV 2022
Y2 - 25 September 2022 through 27 September 2022
ER -