Knowledge driven Description Synthesis for Floor Plan Interpretation

03/15/2021
by   Shreya Goyal, et al.
15

Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, Description Synthesis from Image Cue (DSIC) and Transformer Based Description Generation (TBDG), for the floor plan image to text generation to fill the gaps in existing methods. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model's superiority.

READ FULL TEXT

page 9

page 10

page 12

page 16

research
11/29/2018

Automatic Rendering of Building Floor Plan Images from Textual Descriptions in English

Human beings understand natural language description and could able to i...
research
06/01/2023

CapText: Large Language Model-based Caption Generation From Image Context and Description

While deep-learning models have been shown to perform well on image-to-t...
research
11/28/2018

Towards Task Understanding in Visual Settings

We consider the problem of understanding real world tasks depicted in vi...
research
11/14/2018

SUGAMAN: Describing Floor Plans for Visually Impaired by Annotation Learning and Proximity based Grammar

In this paper, we propose SUGAMAN (Supervised and Unified framework usin...
research
11/20/2014

Learning a Recurrent Visual Representation for Image Caption Generation

In this paper we explore the bi-directional mapping between images and t...
research
10/09/2019

Text-to-Image Synthesis Based on Machine Generated Captions

Text to Image Synthesis refers to the process of automatic generation of...
research
02/03/2023

DEVICE: DEpth and VIsual ConcEpts Aware Transformer for TextCaps

Text-based image captioning is an important but under-explored task, aim...

Please sign up or login with your details

Forgot password? Click here to reset