Video Alignment Using Bi-Directional Attention Flow in a Multi-Stage Learning Model

Reham Abobeah*, Amin Shoukry, Jiro Katto

*この研究の対応する著者

研究成果: Article査読

抄録

Recently, deep learning techniques have contributed to solving a multitude of computer vision tasks. In this paper, we propose a deep-learning approach for video alignment, which involves finding the best correspondences between two overlapping videos. We formulate the video alignment task as a variant of the well-known machine comprehension (MC) task in natural language processing. While MC answers a question about a given paragraph, our technique determines the most relevant frame sequence in the context video to the query video. This is done by representing the individual frames of the two videos by highly discriminative and compact descriptors. Next, the descriptors are fed into a multi-stage network that is able, with the help of the bidirectional attention flow mechanism, to represent the context video at various granularity levels besides estimating the query-aware context part. The proposed model was trained on 10k video-pairs collected from 'YouTube'. The obtained results show that our model outperforms all known state of the art techniques by a considerable margin, confirming its efficacy.

本文言語English
論文番号8963636
ページ(範囲)18097-18109
ページ数13
ジャーナルIEEE Access
8
DOI
出版ステータスPublished - 2020

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)
  • 材料科学(全般)
  • 工学(全般)

フィンガープリント

「Video Alignment Using Bi-Directional Attention Flow in a Multi-Stage Learning Model」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル