Tokenization Tractability for Human and Machine Learning Model: An Annotation Study

04/21/2023
by   Tatsuya Hiraoka, et al.
0

Is tractable tokenization for humans also tractable for machine learning models? This study investigates relations between tractable tokenization for humans (e.g., appropriateness and readability) and one for models of machine learning (e.g., performance on an NLP task). We compared six tokenization methods on the Japanese commonsense question-answering dataset (JCommmonsenseQA in JGLUE). We tokenized question texts of the QA dataset with different tokenizers and compared the performance of human annotators and machine-learning models. Besides,we analyze relationships among the performance, appropriateness of tokenization, and response time to questions. This paper provides a quantitative investigation result that shows the tractable tokenizations for humans and machine learning models are not necessarily the same as each other.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2020

Comparative Study of Machine Learning Models and BERT on SQuAD

This study aims to provide a comparative analysis of performance of cert...
research
11/27/2019

JEC-QA: A Legal-Domain Question Answering Dataset

We present JEC-QA, the largest question answering dataset in the legal d...
research
02/29/2020

Human-in-the-Loop Design Cycles – A Process Framework that Integrates Design Sprints, Agile Processes, and Machine Learning with Humans

Demands on more transparency of the backbox nature of machine learning m...
research
06/12/2018

Learning to Automatically Generate Fill-In-The-Blank Quizzes

In this paper we formalize the problem automatic fill-in-the-blank quest...
research
03/22/2021

Fixes That Fail: Self-Defeating Improvements in Machine-Learning Systems

Machine-learning systems such as self-driving cars or virtual assistants...
research
11/08/2019

A Comprehensive Comparison of Machine Learning Based Methods Used in Bengali Question Classification

QA classification system maps questions asked by humans to an appropriat...
research
11/12/2019

Position Paper: Towards Transparent Machine Learning

Transparent machine learning is introduced as an alternative form of mac...

Please sign up or login with your details

Forgot password? Click here to reset