Learning Context-Aware Representations of Subtrees

11/08/2021
by   Cedric Cook, et al.
0

This thesis tackles the problem of learning efficient representations of complex, structured data with a natural application to web page and element classification. We hypothesise that the context around the element inside the web page is of high value to the problem and is currently under exploited. This thesis aims to solve the problem of classifying web elements as subtrees of a DOM tree by also considering their context. To achieve this, first we discuss current expert knowledge systems that work on structures, such as Tree-LSTM. Then, we propose context-aware extensions to this model. We show that the new model achieves an average F1-score of 0.7973 on a multi-class web classification task. This model generates better representations for various subtrees and may be used for applications such element classification, state estimators in reinforcement learning over the Web and more.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2021

CoVA: Context-aware Visual Attention for Webpage Information Extraction

Webpage information extraction (WIE) is an important step to create know...
research
05/07/2023

Context-Aware Chart Element Detection

As a prerequisite of chart data extraction, the accurate detection of ch...
research
05/30/2020

Web page classification with Google Image Search results

In this paper, we introduce a novel method that combines multiple neural...
research
08/30/2021

Web Application Testing: Using Tree Kernels to Detect Near-duplicate States in Automated Model Inference

In the context of End-to-End testing of web applications, automated expl...
research
07/06/2018

SurfClipse: Context-Aware Meta Search in the IDE

Despite various debugging supports of the existing IDEs for programming ...
research
10/26/2022

WebCrack: Dynamic Dictionary Adjustment for Web Weak Password Detection based on Blasting Response Event Discrimination

The feature diversity of different web systems in page elements, submiss...
research
08/07/2021

Learning to Represent Human Motives for Goal-directed Web Browsing

Motives or goals are recognized in psychology literature as the most fun...

Please sign up or login with your details

Forgot password? Click here to reset