The goal is to predict which of the included question pairs contain pairs having identical meanings. The ground truth is the set of labels supplied by human experts and are inherently subjective, since the true intended meaning of each of the sentences can never be known with a total certainty. Human labeling is also considered a relatively 'noisy' process with its own degree of subjectivity. Therefore, the ground truth labels in the dataset should be taken as 'informed' but not a 100% accurate. The labels, on the whole, should ideally represent a reasonable consensus.
Quora Question Pairs Dataset
DOWNLOAD Quora Question Pairs