Design Challenges for Low-resource Cross-lingual Entity Linking

05/02/2020
by   XingYu Fu, et al.
0

Cross-lingual Entity Linking (XEL) grounds mentions of entities that appear in a foreign (source) language text into an English (target) knowledge base (KB) such as Wikipedia. XEL consists of two steps: candidate generation, which retrieves a list of candidate entities for each mention, followed by candidate ranking. XEL methods have been successful on high-resource languages, but generally perform poorly on low-resource languages due to lack of supervision. In this paper, we show a thorough analysis on existing low-resource XEL methods, especially on their candidate generation methods and limitations. We observed several interesting findings: 1. They are heavily limited by the Wikipedia bilingual resource coverage. 2. They perform better on Wikipedia text than on real-world text such as news or twitter. In this paper, we claim that, under the low-resource language setting, outside-Wikipedia cross-lingual resources are essential. To prove this argument, we propose a simple but effective zero-shot framework, CogCompXEL, that complements current methods by utilizing query log mapping files from online search engines. CogCompXEL outperforms current state-of-the-art models on almost all 25 languages of the LORELEI dataset, achieving an absolute average increase of 25 candidate recall.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset