Exploring and exploiting the hierarchical structure of a scene for scene graph generation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The scene graph of an image is an explicit, concise representation of the image; hence, it can be used in various applications such as visual question answering or robot vision. We propose a novel neural network model for generating scene graphs that maintain global consistency, which prevents the generation of unrealistic scene graphs; the performance in the scene graph generation task is expected to improve. Our proposed model is used to construct a hierarchical structure whose leaf nodes correspond to objects depicted in the image, and a message is passed along the estimated structure on the fly. To this end, we aggregate features of all objects into the root node of the hierarchical structure, and the global context is back-propagated to the root node to maintain all the object nodes. The experimental results on the Visual Genome dataset indicate that the proposed model outperformed the existing models in scene graph generation tasks. We further qualitatively confirmed that the hierarchical structures captured by the proposed model seemed to be valid.

Original languageEnglish
Title of host publicationProceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1422-1429
Number of pages8
ISBN (Electronic)9781728188089
DOIs
Publication statusPublished - 2020
Event25th International Conference on Pattern Recognition, ICPR 2020 - Virtual, Milan, Italy
Duration: 2021 Jan 102021 Jan 15

Publication series

NameProceedings - International Conference on Pattern Recognition
ISSN (Print)1051-4651

Conference

Conference25th International Conference on Pattern Recognition, ICPR 2020
Country/TerritoryItaly
CityVirtual, Milan
Period21/1/1021/1/15

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Exploring and exploiting the hierarchical structure of a scene for scene graph generation'. Together they form a unique fingerprint.

Cite this