PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation

06/27/2019

∙

Chinese word segmentation (CWS) is a fundamental step of Chinese natural language processing. In this paper, we build a new toolkit, named PKUSEG, for multi-domain word segmentation. Unlike existing single-model toolkits, PKUSEG targets at multi-domain word segmentation and provides separate models for different domains, such as web, medicine, and tourism. The new toolkit also supports POS tagging and model training to adapt to various application scenarios. Experiments show that PKUSEG achieves high performance on multiple domains. The toolkit is now freely and publicly available for the usage of research and industry.

READ FULL TEXT

PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation

Sign in with Google

Consider DeepAI Pro