Feature extraction using Latent Dirichlet Allocation and Neural Networks: A case study on movie synopses

by   Despoina Christou, et al.

Feature extraction has gained increasing attention in the field of machine learning, as in order to detect patterns, extract information, or predict future observations from big data, the urge of informative features is crucial. The process of extracting features is highly linked to dimensionality reduction as it implies the transformation of the data from a sparse high-dimensional space, to higher level meaningful abstractions. This dissertation employs Neural Networks for distributed paragraph representations, and Latent Dirichlet Allocation to capture higher level features of paragraph vectors. Although Neural Networks for distributed paragraph representations are considered the state of the art for extracting paragraph vectors, we show that a quick topic analysis model such as Latent Dirichlet Allocation can provide meaningful features too. We evaluate the two methods on the CMU Movie Summary Corpus, a collection of 25,203 movie plot summaries extracted from Wikipedia. Finally, for both approaches, we use K-Nearest Neighbors to discover similar movies, and plot the projected representations using T-Distributed Stochastic Neighbor Embedding to depict the context similarities. These similarities, expressed as movie distances, can be used for movies recommendation. The recommended movies of this approach are compared with the recommended movies from IMDB, which use a collaborative filtering recommendation approach, to show that our two models could constitute either an alternative or a supplementary recommendation approach.


page 1

page 2

page 3

page 4


Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video Representation

Audience discovery is an important activity at major movie studios. Deep...

Learning Topics using Semantic Locality

The topic modeling discovers the latent topic probability of the given t...

Making Use of Affective Features from Media Content Metadata for Better Movie Recommendation Making

Our goal in this paper aims to investigate the causality in the decision...

Document Embedding with Paragraph Vectors

Paragraph Vectors has been recently proposed as an unsupervised method f...

Enhanced Movie Content Similarity Based on Textual, Auditory and Visual Information

In this paper we examine the ability of low-level multimodal features to...

Topology Analysis of International Networks Based on Debates in the United Nations

In complex, high dimensional and unstructured data it is often difficult...

Movie Summarization via Sparse Graph Construction

We summarize full-length movies by creating shorter videos containing th...

Please sign up or login with your details

Forgot password? Click here to reset