Multiple Mask Enhanced Transformer for Robust Visual Tracking

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In visual object tracking tasks, scenarios such as deformation and scale variation are still challenging. In this work, we proposed a new tracking architecture with transformer and multiple masks as its key components. The transformer structure models the spatial and temporal connections among frames. The transformer encoder learns the target via attention mechanism, while the decoder utilizes the information of pervious frames to better track the current frame. To make sure that transformer pays attention to the exact target area, we propose multiple masks. Multiple masks suppress the background while leaving the target area unchanged. Multiple masks consist of spatial masks and temporal masks. Spatial masks focus on the current information while temporal masks make use of the historical information. Multiple masks further enhance the transformer, making it more focused on the target and more robust under extreme scenarios. With the transformer and multiple masks, our proposed tracker achieves the state-of-the-art level performance.

Original languageEnglish
Title of host publication2022 4th International Conference on Robotics and Computer Vision, ICRCV 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages43-48
Number of pages6
ISBN (Electronic)9781665481700
DOIs
Publication statusPublished - 2022
Event4th International Conference on Robotics and Computer Vision, ICRCV 2022 - Virtual, Online, China
Duration: 2022 Sep 252022 Sep 27

Publication series

Name2022 4th International Conference on Robotics and Computer Vision, ICRCV 2022

Conference

Conference4th International Conference on Robotics and Computer Vision, ICRCV 2022
Country/TerritoryChina
CityVirtual, Online
Period22/9/2522/9/27

Keywords

  • multiple masks
  • object tracking
  • transformer

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Multiple Mask Enhanced Transformer for Robust Visual Tracking'. Together they form a unique fingerprint.

Cite this