Using Scene Graphs for Detecting Visual Relationships

Abstract

In this paper we solve the problem of detecting relationships between pairs of objects in an image. We develop spatially aware word embeddings using scene graphs and use joint feature representations containing visual, spatial and semantic embed-dings from the input images to train a deep network on the task of relationship detection. Further, we propose to utilize context aligned scene graph embeddings from the train set, without requiring explicit availability of scene graphs at test time. We show that the proposed method outperforms the state-of-the-art methods for predicate detection and provides competing results on relationship detection. We also show the generalization ability of the proposed method by performing predictions under zero shot settings. Further, we also provide an exhaustive empirical evaluation on each component of the proposed network.

Publication
In International Conference on Pattern Recognition 2021
Siddharth Srivastava
Siddharth Srivastava
Principal Technical Officer

My research interests include 3D Computer Vision, Machine Learning and Image Processing including their applications to autonomous driving vehicles, health informatics and scene understanding.