Design Challenges for Low-resource Cross-lingual Entity Linking
Cross-lingual Entity Linking (XEL) grounds mentions of entities that appear in a foreign (source) language text into an English (target) knowledge base (KB) such as Wikipedia. XEL consists of two steps: candidate generation, which retrieves a list of candidate entities for each mention, followed by candidate ranking. XEL methods have been successful on high-resource languages, but generally perform poorly on low-resource languages due to lack of supervision. In this paper, we show a thorough analysis on existing low-resource XEL methods, especially on their candidate generation methods and limitations. We observed several interesting findings: 1. They are heavily limited by the Wikipedia bilingual resource coverage. 2. They perform better on Wikipedia text than on real-world text such as news or twitter. In this paper, we claim that, under the low-resource language setting, outside-Wikipedia cross-lingual resources are essential. To prove this argument, we propose a simple but effective zero-shot framework, CogCompXEL, that complements current methods by utilizing query log mapping files from online search engines. CogCompXEL outperforms current state-of-the-art models on almost all 25 languages of the LORELEI dataset, achieving an absolute average increase of 25 candidate recall.
READ FULL TEXT