Multiple Mask Enhanced Transformer for Robust Visual Tracking

研究成果

抄録

In visual object tracking tasks, scenarios such as deformation and scale variation are still challenging. In this work, we proposed a new tracking architecture with transformer and multiple masks as its key components. The transformer structure models the spatial and temporal connections among frames. The transformer encoder learns the target via attention mechanism, while the decoder utilizes the information of pervious frames to better track the current frame. To make sure that transformer pays attention to the exact target area, we propose multiple masks. Multiple masks suppress the background while leaving the target area unchanged. Multiple masks consist of spatial masks and temporal masks. Spatial masks focus on the current information while temporal masks make use of the historical information. Multiple masks further enhance the transformer, making it more focused on the target and more robust under extreme scenarios. With the transformer and multiple masks, our proposed tracker achieves the state-of-the-art level performance.

本文言語English
ホスト出版物のタイトル2022 4th International Conference on Robotics and Computer Vision, ICRCV 2022
出版社Institute of Electrical and Electronics Engineers Inc.
ページ43-48
ページ数6
ISBN(電子版)9781665481700
DOI
出版ステータスPublished - 2022
イベント4th International Conference on Robotics and Computer Vision, ICRCV 2022 - Virtual, Online, China
継続期間: 2022 9月 252022 9月 27

出版物シリーズ

名前2022 4th International Conference on Robotics and Computer Vision, ICRCV 2022

Conference

Conference4th International Conference on Robotics and Computer Vision, ICRCV 2022
国/地域China
CityVirtual, Online
Period22/9/2522/9/27

ASJC Scopus subject areas

  • 人工知能
  • コンピュータ サイエンスの応用
  • コンピュータ ビジョンおよびパターン認識
  • 制御と最適化

フィンガープリント

「Multiple Mask Enhanced Transformer for Robust Visual Tracking」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル