Experiments on Paraphrase Identification Using Quora Question Pairs Dataset

06/04/2020
by   Andreas Chandra, et al.
0

We modeled the Quora question pairs dataset to identify a similar question. The dataset that we use is provided by Quora. The task is a binary classification. We tried several methods and algorithms and different approach from previous works. For feature extraction, we used Bag of Words including Count Vectorizer, and Term Frequency-Inverse Document Frequency with unigram for XGBoost and CatBoost. Furthermore, we also experimented with WordPiece tokenizer which improves the model performance significantly. We achieved up to 97 percent accuracy. Code and Dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2021

Student sentiment Analysis Using Classification With Feature Extraction Techniques

Technical growths have empowered, numerous revolutions in the educationa...
research
04/15/2017

Neural Paraphrase Identification of Questions with Noisy Pretraining

We present a solution to the problem of paraphrase identification of que...
research
05/04/2023

Enhancing Pashto Text Classification using Language Processing Techniques for Single And Multi-Label Analysis

Text classification has become a crucial task in various fields, leading...
research
07/01/2019

Natural Language Understanding with the Quora Question Pairs Dataset

This paper explores the task Natural Language Understanding (NLU) by loo...
research
08/20/2014

Seeing through bag-of-visual-word glasses: towards understanding quantization effects in feature extraction methods

Vector-quantized local features frequently used in bag-of-visual-words a...
research
06/10/2022

Sentiment analysis on electricity twitter posts

In today's world, everyone is expressive in some way, and the focus of t...

Please sign up or login with your details

Forgot password? Click here to reset