DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization

05/26/2023
by   Lizhou Fan, et al.
0

Data users need relevant context and research expertise to effectively search for and identify relevant datasets. Leading data providers, such as the Inter-university Consortium for Political and Social Research (ICPSR), offer standardized metadata and search tools to support data search. Metadata standards emphasize the machine-readability of data and its documentation. There are opportunities to enhance dataset search by improving users' ability to learn about, and make sense of, information about data. Prior research has shown that context and expertise are two main barriers users face in effectively searching for, evaluating, and deciding whether to reuse data. In this paper, we propose a novel chatbot-based search system, DataChat, that leverages a graph database and a large language model to provide novel ways for users to interact with and search for research data. DataChat complements data archives' and institutional repositories' ongoing efforts to curate, preserve, and share research data for reuse by making it easier for users to explore and learn about available research data.

READ FULL TEXT

page 2

page 3

research
02/27/2020

Dataset Search In Biodiversity Research: Do Metadata In Data Repositories Reflect Scholarly Information Needs?

The increasing amount of research data provides the opportunity to link ...
research
01/03/2019

Dataset search: a survey

Generating value from data requires the ability to find, access and make...
research
05/28/2021

An Explanatory Query-Based Framework for Exploring Academic Expertise

The success of research institutions heavily relies upon identifying the...
research
11/18/2022

Metadata Might Make Language Models Better

This paper discusses the benefits of including metadata when training la...
research
11/02/2019

Do Chinese Internet Users Exist Heterogeneity in Search Behavior?

Investor attention is an important concept in behavioral finance. Many a...
research
10/21/2022

Approaches to Identify Vulnerabilities to Misinformation: A Research Agenda

Given the prevalence of online misinformation and our scarce cognitive c...
research
06/21/2023

A Hierarchical Approach to exploiting Multiple Datasets from TalkBank

TalkBank is an online database that facilitates the sharing of linguisti...

Please sign up or login with your details

Forgot password? Click here to reset