After All, Only The Last Neuron Matters: Comparing Multi-modal Fusion Functions for Scene Graph Generation
From object segmentation to word vector representations, Scene Graph Generation (SGG) became a complex task built upon numerous research results. In this paper, we focus on the last module of this model: the fusion function. The role of this latter is to combine three hidden states. We perform an ablation test in order to compare different implementations. First, we reproduce the state-of-the-art results using SUM, and GATE functions. Then we expand the original solution by adding more model-agnostic functions: an adapted version of DIST and a mixture between MFB and GATE. On the basis of the state-of-the-art configuration, DIST performed the best Recall @ K, which makes it now part of the state-of-the-art.
READ FULL TEXT