Building Legal Case Retrieval Systems with Lexical Matching and Summarization using A Pre-Trained Phrase Scoring Model
We present our method for tackling the legal case retrieval task of the Competition on Legal Information Extraction/Entailment 2019. Our approach is based on the idea that summarization is important for retrieval. On one hand, we adopt a summarization based model called encoded summarization which encodes a given document into continuous vector space which embeds the summary properties of the document. We utilize the resource of COLIEE 2018 on which we train the document representation model. On the other hand, we extract lexical features on different parts of a given query and its candidates. We observe that by comparing different parts of the query and its candidates, we can achieve better performance. Furthermore, the combination of the lexical features with latent features by the summarization-based method achieves even better performance. We have achieved the state-of-the-art result for the task on the benchmark of the competition.
READ FULL TEXT