Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and Isometric Conditions
Typically, a linearly orthogonal transformation mapping is learned by aligning static type-level embeddings to build a shared semantic space. In view of the analysis that contextual embeddings contain richer semantic features, we investigate a context-aware and dictionary-free mapping approach by leveraging parallel corpora. We illustrate that our contextual embedding space mapping significantly outperforms previous multilingual word embedding methods on the bilingual dictionary induction (BDI) task by providing a higher degree of isomorphism. To improve the quality of mapping, we also explore sense-level embeddings that are split from type-level representations, which can align spaces in a finer resolution and yield more precise mapping. Moreover, we reveal that contextual embedding spaces suffer from their natural properties – anisotropy and anisometry. To mitigate these two problems, we introduce the iterative normalization algorithm as an imperative preprocessing step. Our findings unfold the tight relationship between isotropy, isometry, and isomorphism in normalized contextual embedding spaces.
READ FULL TEXT