Building Metadata Inference Using a Transducer Based Language Model

12/05/2022
by   David Waterworth, et al.
0

Solving the challenges of automatic machine translation of Building Automation System text metadata is a crucial first step in efficiently deploying smart building applications. The vocabulary used to describe building metadata appears small compared to general natural languages, but each term has multiple commonly used abbreviations. Conventional machine learning techniques are inefficient since they need to learn many different forms for the same word, and large amounts of data must be used to train these models. It is also difficult to apply standard techniques such as tokenisation since this commonly results in multiple output tags being associated with a single input token, something traditional sequence labelling models do not allow. Finite State Transducers can model sequence-to-sequence tasks where the input and output sequences are different lengths, and they can be combined with language models to ensure a valid output sequence is generated. We perform a preliminary analysis into the use of transducer-based language models to parse and normalise building point metadata.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2022

Metadata Might Make Language Models Better

This paper discusses the benefits of including metadata when training la...
research
05/11/2023

How Good are Commercial Large Language Models on African Languages?

Recent advancements in Natural Language Processing (NLP) has led to the ...
research
07/28/2022

Sequence to sequence pretraining for a less-resourced Slovenian language

Large pretrained language models have recently conquered the area of nat...
research
02/27/2020

Data-Driven Metadata Tagging for Building Automation Systems: A Unified Architecture

This article presents a Unified Architecture for automated point tagging...
research
08/02/2023

Arithmetic with Language Models: from Memorization to Computation

A better understanding of the emergent computation and problem-solving c...
research
11/06/2022

Deliberation Networks and How to Train Them

Deliberation networks are a family of sequence-to-sequence models, which...
research
10/29/2020

How Many Pages? Paper Length Prediction from the Metadata

Being able to predict the length of a scientific paper may be helpful in...

Please sign up or login with your details

Forgot password? Click here to reset