Do We Need Sound for Sound Source Localization?

Takashi Oya*, Shohei Iwase, Ryota Natsume, Takahiro Itazuri, Shugo Yamaguchi, Shigeo Morishima

*この研究の対応する著者

研究成果: Conference contribution

抄録

During the performance of sound source localization which uses both visual and aural information, it presently remains unclear how much either image or sound modalities contribute to the result, i.e. do we need both image and sound for sound source localization? To address this question, we develop an unsupervised learning system that solves sound source localization by decomposing this task into two steps: (i) “potential sound source localization”, a step that localizes possible sound sources using only visual information (ii) “object selection”, a step that identifies which objects are actually sounding using aural information. Our overall system achieves state-of-the-art performance in sound source localization, and more importantly, we find that despite the constraint on available information, the results of (i) achieve similar performance. From this observation and further experiments, we show that visual information is dominant in “sound” source localization when evaluated with the currently adopted benchmark dataset. Moreover, we show that the majority of sound-producing objects within the samples in this dataset can be inherently identified using only visual information, and thus that the dataset is inadequate to evaluate a system’s capability to leverage aural information. As an alternative, we present an evaluation protocol that enforces both visual and aural information to be leveraged, and verify this property through several experiments.

本文言語English
ホスト出版物のタイトルComputer Vision – ACCV 2020 - 15th Asian Conference on Computer Vision, 2020, Revised Selected Papers
編集者Hiroshi Ishikawa, Cheng-Lin Liu, Tomas Pajdla, Jianbo Shi
出版社Springer Science and Business Media Deutschland GmbH
ページ119-136
ページ数18
ISBN(印刷版)9783030695439
DOI
出版ステータスPublished - 2021
イベント15th Asian Conference on Computer Vision, ACCV 2020 - Virtual, Online
継続期間: 2020 11 302020 12 4

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12627 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference15th Asian Conference on Computer Vision, ACCV 2020
CityVirtual, Online
Period20/11/3020/12/4

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Do We Need Sound for Sound Source Localization?」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル