The TechQA Dataset

11/08/2019
by   Vittorio Castelli, et al.
0

We introduce TechQA, a domain-adaptation question answering dataset for the technical support domain. The TechQA corpus highlights two real-world issues from the automated customer support domain. First, it contains actual questions posed by users on a technical forum, rather than questions generated specifically for a competition or a task. Second, it has a real-world size – 600 training, 310 dev, and 490 evaluation question/answer pairs – thus reflecting the cost of creating large labeled datasets with actual data. Consequently, TechQA is meant to stimulate research in domain adaptation rather than being a resource to build QA systems from scratch. The dataset was obtained by crawling the IBM Developer and IBM DeveloperWorks forums for questions with accepted answers that appear in a published IBM Technote—a technical document that addresses a specific technical issue. We also release a collection of the 801,998 publicly available Technotes as of April 4, 2019 as a companion resource that might be used for pretraining, to learn representations of the IT domain language.

READ FULL TEXT
research
06/10/2018

Cross-Dataset Adaptation for Visual Question Answering

We investigate the problem of cross-dataset adaptation for visual questi...
research
01/10/2018

MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection

We introduce MilkQA, a question answering dataset from the dairy domain ...
research
06/12/2017

Neural Domain Adaptation for Biomedical Question Answering

Factoid question answering (QA) has recently benefited from the developm...
research
05/24/2018

Mining Procedures from Technical Support Documents

Guided troubleshooting is an inherent task in the domain of technical su...
research
04/09/2023

FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain

This paper introduces FrenchMedMCQA, the first publicly available Multip...
research
09/23/2022

Robust Domain Adaptation for Machine Reading Comprehension

Most domain adaptation methods for machine reading comprehension (MRC) u...
research
06/08/2023

Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT and GPT-4 for Cost-Efficient Question Answering

Large language models (LLMs), such as ChatGPT and GPT-4, are gaining wid...

Please sign up or login with your details

Forgot password? Click here to reset