The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset forE-commerce Customer Service

11/22/2019
by   Meng Chen, et al.
7

Human conversations in real scenarios are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more and more prevalent which need a huge amount of real conversation data. In this paper, we construct a large-scale real scenario Chinese E-commerce conversation corpus, JDDC, with more than 1 million multi-turn dialogues, 20 million utterances, and 150 million words. The dataset reflects several characteristics of human-human conversations, e.g., goal-driven, and long-term dependency among the context. It also covers various dialogue types including task-oriented, chitchat and question-answering. Extra intent information and three well-annotated challenge sets are also provided. Then, we evaluate several retrieval-based and generative models to provide basic benchmark performance on JDDC corpus. And we hope JDDC can serve as an effective testbed and benefit the development of fundamental research in dialogue task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2019

The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service

Human conversations in real scenarios are complicated and building a hum...
research
09/27/2021

The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service

With the development of the Internet, more and more people get accustome...
research
10/16/2017

A retrieval-based dialogue system utilizing utterance and context embeddings

Finding semantically rich and computer-understandable representations fo...
research
06/30/2015

The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems

This paper introduces the Ubuntu Dialogue Corpus, a dataset containing a...
research
11/19/2018

Chat More If You Like: Dynamic Cue Words Planning to Flow Longer Conversations

To build an open-domain multi-turn conversation system is one of the mos...
research
05/04/2023

Re^3Dial: Retrieve, Reorganize and Rescale Dialogue Corpus for Long-Turn Open-Domain Dialogue Pre-training

Large-scale open-domain dialogue data crawled from public social media h...
research
04/18/2021

DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators' Labels

We introduce a data set called DCH-2, which contains 4,390 real customer...

Please sign up or login with your details

Forgot password? Click here to reset