In commercial map applications such as Baidu Maps, rich and timely updated POI information (such as POI address, POI coordinates, and POI accessibility reminder) plays an important role in enabling users to entertain location-based services. Among which, the accessibility reminder is of vital importance to users since it is frequently used to support decisions when planning to visit a POI. Figure 1 shows an example of the POI multidimensional information page at Baidu Maps. It can be seen that we have strongly prompted the closing status of the POI at three different places. We hope that users can be fully informed that the POI has been closed before they decide to visit it, so as to avoid the disappointments experienced after traveling tens of kilometers farther to it. Therefore, in order to make sure that the users suffer as little inconvenience as possible when finding places or making visiting decisions with a map application, it is important to provide timely accessibility reminders.
However, it is difficult to keep the POI database in sync with the real-world counterparts due to the dynamic nature of business changes and innovations. Statistics show that of the POIs at Baidu Maps have been updated in 2020. It is extremely time-consuming and expensive to handle such a large number of updates if we heavily rely on human efforts. To reduce labor costs and increase productivity, several recent work has attempted to develop new ways to maintain a POI database. Revaud et al. (2019) proposed to use street-view images to automatically detect changes of POIs. Although it is feasible, the acquisition of geo-tagged street-view images at different times is time-consuming and expensive, which limits its practical applicability when applying to update large-scale POIs. In addition, several researchers proposed to extract POI names from text (Rae et al., 2012; Chuang et al., 2018; Xu et al., 2019). Although extracting POI names is the first and important step towards the maintaining of POI information, there are indispensable attributes that need to be extracted and correlated with the corresponding POIs. Nevertheless, new POIs emerge endlessly and their names are often newly-coined words, while the existing POIs are subject to change over time, resulting in a higher uncertainty on the task frequency and cost. Therefore, it is critical to explore more effective ways to jointly detect POIs and extract their associated attributes from text.
After meticulous analysis, we find that many business entities prefer to publish the business change information on their official websites or Internet news in a timely fashion. This demonstrates that massive Web pages are valuable data sources for large-scale extraction of POI change information. As POI accessibility is vital information to users, we present a practical solution that jointly extracts POI mentions and identifies their coupled accessibility labels from unstructured text (hereafter referred to as joint POI and accessibility extraction). We frame this task as a sequence tagging problem and consider the following four mainstream accessibility labels: one for emerging POIs: NEW and the other three for updating the accessibility of existing POIs: RENAME, RELOC (the abbreviation of “relocation”), and CLOSE. Figure 2 shows four representative examples of this task. This task is challenging because of the following two main issues.
(1) Rare or unknown words: POI names are often newly-coined words so as to successfully register new entities or brands. As a result, POI names are typically regarded as out-of-vocabulary (OOV) words, the semantic meaning of them can hardly be captured by neural-based models. As illustrated by the first example in Figure 2, KFC is a well-known chain brand, which widely exists in our POI database. However, Staten Island is absent from our POI database, which will be regarded as an OOV word.
(2) Many-to-one or one-to-many mapping: There may exist multiple ¡POI name, accessibility label¿ pairs in the text, which necessitates dealing with one-to-many or many-to-one mapping to make each POI coupled with its matching accessibility label. For example, the first sentence in Figure 2 mentions two POI names (KFC and Staten Island), but there exists only one accessibility label (CLOSE). Therefore, the following two pairs should be extracted from it, i.e., ¡KFC, CLOSE¿ and ¡Staten Island, NONE¿.
To this end, we propose a Geographic-Enhanced and Dependency-guIded Tagger (GEDIT) to concurrently address the two challenges. GEDIT casts the POI accessibility recognition task as a sequence tagging problem by giving each token a joint mention-accessibility label. As a result, GEDIT is able to jointly extract POI mentions and identify their accessibility labels. Consequently, GEDIT can produce arbitrary number of ¡POI name, accessibility label¿ pairs simultaneously.
To alleviate challenge #1, GEDIT adopts a geographic-enhanced pre-trained language model (Devlin et al., 2019), which is able to significantly relieve the problem of newly-coined POI names. For example, by taking advantage of the geographic knowledge in the addresses of existing POIs, the pre-trained model is able to learn the patterns of coining new POI names. As a result, new POIs could be better handled.
To mitigate challenge #2, we apply a relational graph convolutional network (RGCN) (Schlichtkrull et al., 2018) to learn the tree node representations from the parsed dependency tree, which enables us to establish a correlation between a POI and its accessibility label. As a result, GEDIT is able to avoid the distraction from the auxiliary POIs that do not have any accessibility changes. Take the first sentence in Figure 2 as an example, with the aid of rhetorical relation between the word “closed” and “KFC”, it is easy to know that the closed POI is “KFC” rather than “Staten Island”.
Finally, we construct a neural sequence tagging model by integrating and feeding the previously pre-learned representations into a CRF (Lafferty et al., 2001) layer.
Given the lack of an appropriate benchmark, we construct and release a large-scale real-world dataset named WebPOIs.111The dataset is publicly available at https://github.com/PaddlePaddle/Research/tree/master/ST_DM/CIKM2021-GEDIT/ Extensive experiments conducted on WebPOIs dataset demonstrate that GEDIT significantly outperforms several strong sequence tagging baselines with a large margin. Statistics show that the proposed solution can save significant human effort and labor costs to deal with the same amount of documents, which confirms that it is a practical way for POI accessibility maintenance.
Our contributions can be summarized as follows:
Potential impact: Geographic-enhanced and dependency-guided sequence tagging (GEDIT) model is our first attempt to devise a neural model that handles large-scale text to maintain the accessibilities of hundreds of millions of POIs at Baidu Maps. GEDIT has been successfully deployed in production at Baidu Maps. It keeps inspecting hundreds of thousands of documents every week, saving significant labor costs in practice.
Novelty: The design and implementation of GEDIT are driven by the novel idea that takes advantage of a geographic-enhanced pre-trained model and dependency relations to guide the sequence tagging model, which is able to produce more accurate results from text.
Technical quality: The offline experiments demonstrate that GEDIT can consistently achieve significant improvements on score in comparison with several strong baselines. After we deployed GEDIT in production, the efficiency of manual verification increases by 17.8%, which dramatically saves the maintenance costs at Baidu Maps.
A new and challenging dataset: The WebPOIs dataset is composed of 19,333 documents and 99,139 POIs, which is expected to bring this substantial but challenging task to the attention of researchers both in academia and in industry.
2. Task Formulation and Dataset
2.1. Task Formulation
Given a document of words, denoted by , the outputs of the proposed task are all continuous sub-sequence chunks representing POIs and their accessibility label NEW, RELOC, RENAME, CLOSE.
We first break the task into two sub-tasks, including POI term extraction (PTE) for extracting POIs from the document and POI accessibility identification (PAI) for identifying the accessibility of each POI term. Instead of performing PTE and PAI step by step in a pipeline paradigm that does not fully exploit the joint information between them, our proposed framework learns the two sub-tasks jointly. Since PTE is a sequence tagging task and PAI is a classification task, they cannot be directly trained together. Thus, we convert PAI to a sequence tagging task by giving each POI token an accessibility label.
With the help of the sequence decision in PAI that models the relationship of each POI accessibility, this joint training paradigm could learn to extract arbitrary pairs of ¡POI name, accessibility label¿ from the text efficiently. We use BIO schema (Ratinov and Roth, 2009) where the prefixes B, I, and O indicate the Beginning, the Inside, and the Outside of a chunk, respectively. To exploit the effectiveness of different labeling schemas, we design two different labeling settings in this work, i.e., joint setting and separate setting. As shown in Figure 3, the joint setting represents the information for POI and its accessibility simultaneously in one label set. By contrast, the separate setting uses two kinds of labels as two sub-tasks. Formally, for each word , in the separate setting, we assign a tag in PTE and assign a tag in PAI, where B-POI, I-POI, O and NEW, RELOC, RENAME, CLOSE, NONE, O. In the joint setting, we integrate and into one set B-NEW, I-NEW, B-RELOC, I-RELOC, B-RENAME, I-RENAME, B-CLOSE, I-CLOSE, B-NONE, I-NONE, O.
2.2. Benchmark Dataset
As there is no public dataset available for this task, we construct the WebPOIs dataset using a three-step way.
To retrieve high-quality documents containing POI information, we utilize multiple data sources, including general Web documents and official websites. We manually construct queries for searching the public Web documents and keep the top-ranked results returned by a search engine. For the websites, we crawl the documents based on a list of keywords related to POI accessibility.
After document collection, we use a two-step pruning operation to ensure that the obtained documents contain high-quality POI accessibility information. First, we prune the documents based on the number of POIs recognized by a pre-trained POI recognizer. We only keep those documents that contain two or more POIs detected by the POI recognizer. The POI recognizer is a high-performance sequence tagging model that has been deployed in Baidu Maps. We conduct this step to make sure that the documents convey information related to POIs. Second, we prune the documents based on a dictionary containing words that express the meaning of POI accessibility, such as “move” and “open”. We keep those documents that contain at least one word in the dictionary. This step is conducted to make sure that the documents contain descriptions of POI accessibility.
To annotate a high-quality dataset, we hire a team of full-time annotators, and select qualified annotators using the following process. First, the annotators are told to learn a carefully crafted annotation guideline that includes examples of excellent and lousy POI accessibility change annotations on those documents and why they were categorized as such. Then, they are told to practice and examine themselves on two small sets of documents with correct labels. This train-practice-examine process is iterated three times within a month.
After selecting the qualified annotators, we separate them into an annotation team and a quality assurance (QA) team. We first ask the annotation team to mark all POIs and their accessibility labels including NEW, RENAME, RELOC, CLOSE, and NONE. Then, we ask the QA team, which consists of annotators with the top scores in the examination, to inspect over 20% of the labeled data randomly. Finally, the researchers randomly inspect 5% of the labeled data and merge the data into WebPOIs if they are clean enough. If the accuracy is below 90%, the whole data are sent back to re-annotate in each inspection stage. For each annotation-inspection process, we process 1,000 documents. All the procedures mentioned above are conducted on an in-house CMS system. To motivate the annotators to perform high-quality annotations, the higher the annotation accuracy achieves, the more they are paid.
The WebPOIs dataset comprises 19,333 documents and 99,139 POIs. Table 1 and Table 2 show the detailed statistics of WebPOIs. This new dataset enables us to analyze POIs that appeared within complex linguistic phenomena.
|# of Document||19,333|
|# of POI||99,139|
|# of unique POI||44,167|
|Averaged POIs / Document||5.1|
|Averaged Words / Document||195.7|
|Averaged Words / POI||7.1|
In this section, we detail the proposed model GEDIT. As shown in Figure 4, GEDIT contains three major components: (1) Geographic-Enhanced Text Representation Learning, (2) Dependency Relation Learning, and (3) Joint POI and Accessibility Extraction. For an input document , we first use component #1 to learn the geographic-enhanced text representations of . Simultaneously, we use component #2 to learn the dependency tree node representations of . Finally, we use component #3 to get ’s fused representations, and then jointly extract POI and accessibility labels by using the fused representations.
3.1. Geographic-Enhanced Text Representation Learning
To relieve the problem of newly-coined POI names, we explore to incorporate prior geographic knowledge into a pre-trained language model ERNIE (Sun et al., 2019)
. The geographic knowledge comes from the massive POI database and the POI search logs at Baidu Maps. Specifically, we incorporate geographic knowledge into the model by continuing to train a mask language model (MLM) task based on the parameters of ERNIE. In the MLM task, we organize each document in the form of the concatenation of the following four types of text information: (1) the most frequent query when searching for a POI, (2) the full POI name, (3) the POI address, and (4) the POI type. We separate the query from other information with a [SEP] token. We use the whole word mask (WWM) strategy to make predictions for the phrases in each document. We use a query component analysis module deployed at Baidu Maps to split each document at the granularity of geographic entities. Each geographic entity in a document has a 15% probability of being masked and predicted by the language model during the training process. For each word in the selected entity, we replace the word with a “[MASK]” token with 70% probability, replace the word with a misspelled word with 10% probability, replace the word with a random word with 10% probability, and leave the word unchanged with 10% probability. The words in a query that do not match any words in the target POI name are treated as misspelled words. With this training procedure, we can learn four types of geographic knowledge in the MLM task as follows. (1) The natural language description of POI name and address. (2) The relationship between POI name, address, and type. (3) The relationship between query, POI name, and address. (4) The possible misspelling of POI name and address.
Formally, given a document , we first tokenize into a sub-word sequence , where denotes each sub-word and
represents the length of the sub-word sequence. Then we use the above-mentioned geographic-enhanced ERNIE (GERNIE) to get the sequence of latent vectors. These latent vectors are shared among PTE and PAI.
3.2. Dependency Relation Learning
Not every POI in a document is followed by an accessibility change. Such POIs would significantly confuse the model to identify which POIs have accessibility changes. Based on observations, some words can indicate the accessibility changes, which inspires us to link these kinds of words to the target POI to facilitate determining its accessibility label. The dependency tree of the document can reflect the rhetorical relations of different nodes, which could help link these kinds of indicator words to the target POI nodes. Thus, we consider encoding the dependency tree into the text representations.
3.2.1. Dependency Tree Construction
Specifically, we first segment the document into sub-words. Then, we use a dependency parsing tool222https://github.com/baidu/DDParser to construct the dependency tree of the given document, where is the node-set of the dependency tree, and is dependency relations set. In this paper, we use 14 pre-defined rhetorical relations as the type of the tree edge. Finally, we record the mapping relations between the sub-words of the original document and the tree nodes.
3.2.2. Dependency Relation Learning
The dependency tree contains different types of relations, and various relations may play different roles in generating node representations. For example, the subject-verb relation and the verb-object relation may be more important than other kinds of relations. To thoroughly learn this difference, we apply a relational graph convolutional network (RGCN) (Schlichtkrull et al., 2018) to encode the dependency tree and learn the node representations.
First, we use the average of the sub-word representations as the initial node features of the RGCN input. For each node , the initial representation of it is computed as follows:
where the node is composed of sub-words .
Then, we utilize the RGCN to encode the structure information into the node representations. Unlike regular GCNs, the RGCN introduces relation-specific transformations, i.e., depending on the type and direction of an edge, and accumulating transformed feature vectors of neighboring nodes through the type of the edge. This process is formulated as follows:
where are trainable matrices. denotes the -th RGCN layer. denotes the set of neighbor indices of node under relation . is a problem-specific normalization constant. We use two layers of RGCN to learn the tree structure and obtain the node representations of the dependency tree.
3.3. Joint POI and Accessibility Extraction
After obtaining the document ’s sub-word representations with GERNIE and its dependency tree node representations with RGCN, we use an attention mechanism to get ’s fused representations, which are used to jointly extract POI and accessibility label.
3.3.1. Representation Fusion
To establish an efficient connection between the words in each document and their corresponding nodes in the dependency tree, we explore two kinds of fusion strategies to produce the sub-word representations as follows.
Hard Attention Fusion
For each sub-word in a given document, they all have a corresponding node in the dependency tree, recorded in mapping table . We fetch the node representation and concatenate it with the sub-word representation. For example, given sub-word id , we look up the node id from , find the node representation from , and then concatenate it with sub-word representation , which is formulated as follows:
where is the final sub-word representation.
Soft Attention Fusion
The hard attention fusion strategy only uses the node representation belonging to the sub-word, which cannot fully utilize the node representations with similar semantic information. Thus, we design a soft attention module to fuse all node representations as final sub-word representations.
Specifically, given a sub-word id , we look up its sub-word representation from the output of GERNIE, and obtain all node representations . The soft attention is defined as:
where is the trainable matrix and is the attention weight. is the aggregated node representation of sub-word .
Then, we concatenate and to obtain the final sub-word representation:
3.3.2. POI and Accessibility Tagging
After obtaining the fused sub-word representations, we explore two settings: (1) separate setting and (2) joint setting, to predict the chunks of the POIs and their accessibility labels.
For the separate setting, we treat the chunks of the POIs and the accessibility prediction as two independent sub-tasks. Specifically, the sub-word representations is fed into two different CRF models to generate the labels for PTE and PAI, respectively:
where is the POI sequence labels and is the accessibility labels. and denote all parameters used for both sub-tasks.
After tagging, the remaining step is to obtain the POI terms and their accessibility labels. It is convenient to get the POI terms of the given sentence according to the meaning of the elements in . To generate the accessibility of each POI term, we regard the POI term as the boundary of the accessibility labels and then count the number of accessibility labels within the boundary. We adopt a voting mechanism, regard the accessibility label with the maximum number as the ultimate label for the POI term. If there exist equal amount of accessibility labels, we regard the first one as the final result. For example, the final label of “RELOC, RELOC, RENAME, RENAME” is “RELOC”, the final label of “NEW, NEW” is “NEW”, and the final label of “NEW, RELOC, RELOC” is “RELOC”. This method is simple but effective.
|- w/o Geographic Knowledge||0.738||0.764||0.774||0.765||0.751||0.751|
|- w/o Dependency Relations||0.731||0.778||0.772||0.782||0.752||0.752|
Although the separate setting is effective, its independent classification decision does not consider the dependencies across output labels. This may result in limiting performance over the task that has strong label dependencies. Thus, we propose a joint setting for this task.
For the joint setting, as illustrated in Figure 3 and defined in Section 2.1, we collapse the labels in PTE and PAI into one set for tagging. Similar to the separate setting, we use a CRF model to jointly make the tagging decisions. During training, the log-likelihood is as follows:
where is the probability function of sequence in CRF, and is the weight in CRF. During decoding, we predict the output sequence using the Viterbi algorithm (Forney, 1973).
In this section, we describe experiment settings, evaluating metrics, and report empirical results on the WebPOIs dataset.
4.1. Settings and Evaluation Metrics
to optimize our model with the learning rate of 0.001, and the batch size is set to 32. We set the maximum epoch number for training to 10.
For evaluation, we employ the score of each accessibility label and use Macro/Micro as the overall evaluation metrics. In the separate setting, an accessibility label is regarded as correct when its accessibility type and the corresponding POI mention are both correct.
4.2. Comparison Methods
We evaluate GEDIT against the following mainstream methods used in general sequence tagging tasks.
BiLSTM+CRF (Huang et al., 2015) is a bidirectional LSTM model to learn representations of words from a document with a CRF model for the NER task.
CNN+BiLSTM+CRF (Wu et al., 2019) is a CNN-LSTM-CRF neural architecture to capture both local and long-distance contexts for named entity recognition. The model jointly trains NER and word segmentation models to enhance the ability of the NER model in identifying entity boundaries.
ERNIE+CRF (Sun et al., 2019) uses pre-trained ERNIE model to learn the word representations from a document and uses CRF for POI and accessibility decoding.
4.3. Results and Analysis
Table 3 shows the performance of different models. Results show that GEDIT significantly outperforms all baselines. Specifically, we have the following observations.
(1) The models with joint setting perform better than those with separate setting. It suggests that jointly training PTE and PAI can better utilize the shared information between the two tasks. It also shows the advantages of considering the interactions between the two relevant tasks of PTE and PAI.
(2) LSTM-based models (BiLSTM+CRF and CNN+BiLSTM+CRF) perform better than CNN-based models (CNN+CRF and LR-CNN), which indicates that this task is susceptible to the sequence order. CNN is good at extracting local n-gram features from the document. However, it cannot model the word order information well. Furthermore, we can observe that CNN+BiLSTM+CRF performs better than both CNN-based and LSTM-based models, which shows that combining the advantages of CNN and LSTM is able to better model this task.
(3) Compared with both CNN-based and LSTM-based models, the models using ERNIE as an encoding module significantly outperform them in both separate and joint settings. It suggests that pre-trained language models have a stronger ability in modeling the semantic representations of sentences than CNN-based and LSTM-based models.
(4) GEDIT significantly outperforms all baselines. Compared with the ERNIE-based model, GEDIT further introduces geographic-enhanced text representations. Moreover, it considers the dependency relations of different text nodes, and applies a relational graph convolutional network to encode this kind of relations to overcome the influence of POIs that do not have accessibility changes. As a result, it is able to make more accurate accessibility predictions.
(5) Compared with GEDIT with soft attention (GEDIT), GEDIT with hard attention (GEDIT) performs better on both separate and joint settings. Hard attention directly uses the node representation that belongs to the sub-word. By contrast, soft attention tries to fuse all graph nodes by the attention, which may fuse more noise and make predictions less accurate than hard attention. However, we can observe that GEDIT still works better than ERNIE+CRF, which further shows that encoding the dependency relations of different nodes into sub-word representation helps predict the POI’s accessibility.
Overall, our model (GEDIT) achieves the best performance of 76.3% in terms of both Macro and Micro .
4.3.1. Ablation Studies
We also performed extensive ablation experiments over the two components of GEDIT to figure out their relative importance. Specifically, three variations of GEDIT with the following settings are implemented for comparison.
w/o Geographic Knowledge: Replace the GERNIE component with the vanilla ERNIE model.
w/o Dependency Relations: Remove the dependency relation learning component and only use GERNIE.
ERNIE+CRF: Remove both GERNIE and dependency relation learning components, and only utilize the ERNIE model to tag POIs and their accessibility labels.
The results of ablation studies are presented at the bottom of Table 3. From the results, we observe that:
(1) Compared with GEDIT, both the Macro and Micro of “GEDIT - w/o Geographic Knowledge” model decline by 1.2% absolutely. This shows that replacing the vanilla ERNIE model with the geographic knowledge enhanced model GERNIE can bring significant improvements to this task. The main reason is that GERNIE is able to relieve the problem of newly-coined and OOV words, which facilitates accurately extracting POI names.
(2) Compared with GEDIT, both the Macro and Micro of “GEDIT - w/o Dependency Relations” model decline by 1.1% absolutely. This shows that introducing dependency relations can also bring significant improvements to this task. The main reason is that dependency relations can help avoid the distraction from the auxiliary POIs that do not have any accessibility changes.
(3) GEDIT significantly outperforms ERNIE+CRF by a large margin in terms of all metrics. In addition, GEDIT also significantly outperforms both “GEDIT - w/o Geographic Knowledge” and “GEDIT - w/o Dependency Relations” in terms of Macro and Micro . This demonstrates that incorporating both components of geographic knowledge and dependency relations into a sequence tagging framework for joint POI and accessibility extraction can lead to more significant improvements than using neither of them or using either of them individually.
4.3.2. Performance of POI Term Extraction
To investigate the effectiveness of different models on POI term extraction, we conduct the experiments and demonstrate the performance in Table 4. All models in experiments are conducted with the joint setting.
From the results in Table 4, we observe that:
(1) ERNIE-based models have an obvious advantage over CNN-based and LSTM-based models. The main reason is that the POI terms are usually rare words, making the POI tagging more difficult than the vanilla named entity recognition task.
(2) The proposed model (GEDIT) shows better performance than ERNIE+CRF. The main reason is that combining both sub-word and tree node representations can guide the model to better find the boundaries of POIs.
To obtain a deeper understanding of the effect of POI name length on the performance of our model, we further compare the performance of different models on POIs with various lengths. To accomplish this, we separate the POI set into three groups: (1) POIs with 3 or less (3) Chinese characters (short); (2) POIs with 4–5 (3 & 5) Chinese characters (medium); and (3) POIs with 5 Chinese characters (long). We report the performance of GEDIT and ERNIE on extracting short, medium, and long POIs.
Table 5 shows the results. We observe that GEDIT achieves 6.6% (short), 4.6% (medium), and 3.4% (long) absolute improvements over ERNIE+CRF in terms of , which demonstrates that GEDIT achieves greater improvements for shorter POIs. The main reason is that shorter POIs convey less information than longer ones, making it more difficult for a model to learn from. Therefore, the shorter a POI is, the more difficult it is to make an accurate prediction. GEDIT is able to utilize external geographic knowledge to mitigate this issue, and consequently performs better on shorter POIs.
5. Practical Applicability
We describe how we deploy GEDIT into the POI data maintenance process at Baidu Maps. In each week, we first extract millions of documents containing POIs from multiple data sources, including general Web documents and official websites. Next, we filter these documents by a list of keywords that indicate POI accessibility changes, which generates hundreds of thousands of documents. Then, we feed these documents into GEDIT. Once we obtain the extracted ¡POI name, accessibility label¿ pairs from GEDIT, we use some heuristic rules to remove the inappropriate pairs. The pairs are then sent to a linking process to decide whether we add the extracted POI into the POI database. For those pairs that have an accessibility label of NEW, the linked ones are abandoned. For those pairs that have other accessibility labels, we abandon the un-linked ones. After finishing all the procedures described above, we can obtain about 4,000 to 10,000 ¡POI name, accessibility label¿ pairs per week. These pairs are finally sent to the operators for manual verification to ensure that the data are accurate and compliant at Baidu Maps.
A key indicator of the effectiveness of the POI data maintenance is the success rate of manual verification (SRMV). SRMV means that of all the ¡POI name, accessibility label¿ pairs sent for verification, how many of them are eventually confirmed true and accepted for publication in the POI database. The quality of extracted ¡POI name, accessibility label¿ pairs will directly affect SRMV. After we deployed the GEDIT model into the POI data maintenance process, the SRMV increases by 17.8% compared to the previously deployed extractor. This demonstrates that GEDIT is able to save significant human effort and labor costs, which confirms that it is a practical solution for POI accessibility maintenance.
We first discuss an alternative way to accomplish the task. A natural idea to accomplish this is to directly train a binary classifier to check whether a document contains POI accessibility information, and then send the document with accessibility labels to an operator for manual verification. Although this is a straightforward method, it usually takes an operator a lot of time on identifying the boundaries of a POI, and thus can hardly be applied in large-scale POI data maintenance. This is evident from the building of the WebPOIs dataset. During which, we found that the most time-consuming stage of manual annotation comes from the annotation of POI boundaries. It becomes even worse when the operators are not familiar with the POI names. As a consequence, they inevitably take a lot of time on determining the boundaries of POIs by repeatedly verifying and searching for additional information on the Internet, which would greatly reduce the work efficiency.
Moreover, the accessibility of a POI strongly correlates with the time-dependent variation of individual business activities, which necessitates extracting ¡POI name, accessibility label, time of taking effect¿ triplets rather than only producing ¡POI name, accessibility label¿ pairs from unstructured text. In practice, the deployment of GEDIT is accompanied by a post-processing step, which applies a heuristic method to extract time information. Specifically, the accessibility time of a POI is obtained by: (1) extracting the time entity in the document with a NER model and (2) using the document creation time if the previous method fails. However, it remains challenging to identify the accurate accessibility time (e.g., “March 31, 2020”) of a POI due to loosely described date and time (e.g., “Mother’s Day” and “September 9th in lunar calendar”), fuzzy date and time (e.g., “opening soon” and “around the Mid-Autumn Festival in 2021”), and relative time span (e.g., “the first day of the Dragon Boat Festival holiday” and “three days later”). In addition, there may exist multiple ¡POI name, accessibility label, time of taking effect¿ triplets in one document, which brings a major challenge to map each time chunk to the corresponding POI chunk. To address these challenges, we are developing an end-to-end, time-aware extension of GEDIT, which will be introduced in our following work.
7. Related Work
Here we briefly review the closely related work in the fields of POI maintenance and named entity recognition.
7.1. POI Maintenance
POI database maintenance, which includes discovering emerging POIs and updating existing POIs, is an essential and crucial task for commercial map applications. In the past, this procedure has generally relied on manual input, which is tedious and expensive (Mummidi and John, 2008; Ruta et al., 2012)
. With the emerging of massive user-generated content and the development of machine learning methods, several methods have been proposed to effectively discover or update POI information.
Some early studies focus on discovering POIs from images by recognizing the logs or brand symbols that make the POIs identifiable. Early work proposed to detect logos from images with hand-crafted features (Revaud et al., 2012; Romberg et al., 2011)2017a, b) outperform the previous hand-crafted approaches. For updating existing POIs, Revaud et al. (2019) proposed to detect changes of POIs based on comparing two image sets of the same venue at different times.
Several work has attempted to leveraging text data such as Web snippets (Rae et al., 2012; Chuang et al., 2016), yellow pages (Ahlers, 2013), and tweets (Xu et al., 2019) to discover emerging POIs. For updating existing POIs, Zhou et al. (2013) proposed a method for updating POIs based on Sina Weibo check‐in data. They detect new POIs by analyzing the check-in data that emerges over time. Chuang et al. (2018) proposed a feature-based method for detecting outdated POIs using crawled Web snippets.
Our work is significantly different in the following aspects. (1) We consider a complete task of POI database maintenance, including discovering new POIs and updating existing POIs, while other studies only focus on a portion of the task. (2) We take advantage of both pre-trained language models and dependency parsing to guide a sequence tagging model, which is able to jointly extract POI mentions and identify their coupled accessibility labels from unstructured text, saving much labor costs in practice.
7.2. Named Entity Recognition
Since POI terms can be regarded as a kind of named entity, one of our sub-tasks, POI term extraction, is closely related to named entity recognition (NER), which is a traditional natural language processing task. NER is regarded as a sequence tagging problem. Early studies used feature-based classifiers(Lafferty et al., 2001; Florian et al., 2003) to build a tagger model. With the development of neural networks, studies based on CNN (Collobert et al., 2011) and LSTM-CRF (Chiu and Nichols, 2016; Lample et al., 2016; Ma and Hovy, 2016) get promising results. Recently, fine-tuned models based on pre-trained language models such as ELMO (Peters et al., 2018) and BERT (Devlin et al., 2019), have achieved impressive performance.
The difference between our proposed task and NER lies in that the main evidence used to predict the output tag is different. In NER the output tag of an entity’s type such as ORG or PER is mainly determined by its name, while in our study the output tag of accessibility is determined by the context.
8. Conclusions and Future Work
It is of vital importance to provide timely accessibility reminders of POIs to the users at commercial map applications. In this paper, we present a novel task that jointly extracts POI mentions and identifies their coupled accessibility labels from unstructured text. We formulate it as a sequence tagging problem, where the goal is to produce ¡POI name, accessibility label¿ pairs from unstructured text. To address the two challenges: (1) rare or unknown words and (2) many-to-one or one-to-many ¡POI name, accessibility label¿ mapping, we propose a Geographic-Enhanced and Dependency-guIded sequence Tagging (GEDIT) model. GEDIT not only adopts a geographic-enhanced pre-trained model to learn the text representations, but also applies a relational graph convolutional network to learn the tree node representations from the parsed dependency tree. Extensive experiments conducted on a real-world dataset demonstrate the superiority and effectiveness of GEDIT. In addition, statistics show that the proposed solution can save significant human effort and labor costs to deal with the same amount of documents.
In the future, we consider extending the proposed solution and further addressing the following open problems.
(1) Other attributes, such as the exact time that indicates when a POI changes its accessibility label, also deserve to be extracted from Web text. We plan to explore ways to identify and extract such attributes in the future.
(2) The users’ search (Huang et al., 2016, 2017, 2020b, 2020a; Fan et al., 2021; Huang et al., 2021) and navigation (Fang et al., 2020, 2021) behaviors on visiting opened POIs differ from those on visiting closed POIs, which can be leveraged as valuable evidence to detect changes of POIs. As future work, we plan to investigate whether it is practical to identify accessibility changes of POIs from the search and navigation logs of map applications.
(3) The accessibility of POIs obtained from Web text needs to be further verified by human annotations. We plan to develop solutions to automatically perform verification and validation steps, which significantly reduces labor costs.
- Agarap (2018) Abien Fred Agarap. 2018. Deep Learning using Rectified Linear Units (ReLU). arXiv preprint arXiv:1803.08375 (2018).
- Ahlers (2013) Dirk Ahlers. 2013. Business Entity Retrieval and Data Provision for Yellow Pages by Local Search. In IRPS Workshop @ ECIR2013.
- Chiu and Nichols (2016) Jason P. C. Chiu and Eric Nichols. 2016. Named Entity Recognition with Bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguistics 4 (2016), 357–370.
- Chuang et al. (2016) Hsiu-Min Chuang, Chia-Hui Chang, Ting-Yao Kao, Chung-Ting Cheng, Ya-Yun Huang, and Kuo-Pin Cheong. 2016. Enabling Maps/Location Searches on Mobile Devices: Constructing a POI Database via Focused Crawling and Information Extraction. Int. J. Geogr. Inf. Sci. 30, 7 (2016), 1405–1425.
- Chuang et al. (2018) Hsiu-Min Chuang, Chia-Hui Chang, and Wang-Chien Lee. 2018. Detecting Outdated POI Relations via Web-Derived Features. Trans. GIS 22, 5 (2018), 1238–1256.
- Collobert et al. (2011) Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. 2011. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. Res. 12 (2011), 2493–2537.
- Devlin et al. (2019) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.
- Fan et al. (2021) Miao Fan, Yibo Sun, Jizhou Huang, Haifeng Wang, and Ying Li. 2021. Meta-Learned Spatial-Temporal POI Auto-Completion for the Search Engine at Baidu Maps. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2822–2830.
et al. (2021)
Xiaomin Fang, Jizhou
Huang, Fan Wang, Lihang Liu,
Yibo Sun, and Haifeng Wang.
SSML: Self-Supervised Meta-Learner for En Route Travel Time Estimation at Baidu Maps. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2840–2848.
- Fang et al. (2020) Xiaomin Fang, Jizhou Huang, Fan Wang, Lingke Zeng, Haijin Liang, and Haifeng Wang. 2020. ConSTGAT: Contextual Spatial-Temporal Graph Attention Network for Travel Time Estimation at Baidu Maps. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2697–2705.
- Florian et al. (2003) Radu Florian, Abraham Ittycheriah, Hongyan Jing, and Tong Zhang. 2003. Named Entity Recognition through Classifier Combination. In Proceedings of the Seventh Conference on Natural Language Learning. 168–171.
- Forney (1973) G David Forney. 1973. The Viterbi Algorithm. Proc. IEEE 61, 3 (1973), 268–278.
et al. (2019)
Tao Gui, Ruotian Ma,
Qi Zhang, Lujun Zhao,
Yu-Gang Jiang, and Xuanjing Huang.
CNN-Based Chinese NER with Lexicon Rethinking.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 4982–4988.
- Huang et al. (2020a) Jizhou Huang, Haifeng Wang, Miao Fan, An Zhuo, and Ying Li. 2020a. Personalized Prefix Embedding for POI Auto-Completion in the Search Engine of Baidu Maps. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2677–2685.
- Huang et al. (2021) Jizhou Huang, Haifeng Wang, Yibo Sun, Miao Fan, Zhengjie Huang, Chunyuan Yuan, and Yawen Li. 2021. HGAMN: Heterogeneous Graph Attention Matching Network for Multilingual POI Retrieval at Baidu Maps. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3032–3040.
- Huang et al. (2020b) Jizhou Huang, Haifeng Wang, Wei Zhang, and Ting Liu. 2020b. Multi-Task Learning for Entity Recommendation and Document Ranking in Web Search. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 5 (2020), 1–24.
et al. (2017)
Jizhou Huang, Wei Zhang,
Shiqi Zhao, Shiqiang Ding, and
Haifeng Wang. 2017.
Learning to Explain Entity Relationships by Pairwise Ranking with Convolutional Neural Networks. InProceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. 4018–4025.
- Huang et al. (2016) Jizhou Huang, Shiqi Zhao, Shiqiang Ding, Haiyang Wu, Mingming Sun, and Haifeng Wang. 2016. Generating Recommendation Evidence Using Translation Model. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2810–2816.
- Huang et al. (2015) Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv preprint arXiv:1508.01991 (2015).
- Kim (2014) Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1746–1751.
- Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations.
- Lafferty et al. (2001) John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning. 282–289.
- Lample et al. (2016) Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural Architectures for Named Entity Recognition. In The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 260–270.
- Ma and Hovy (2016) Xuezhe Ma and Eduard H. Hovy. 2016. End-to-End Sequence Labeling via Bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 1064–1074.
- Mummidi and John (2008) Lakshmi Narayana Mummidi and Krumm John. 2008. Discovering Points of Interest From Users’ Map Annotations. GeoJournal 72, 3 (August 2008), 215–227.
- Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1532–1543.
- Peters et al. (2018) Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2227–2237.
- Rae et al. (2012) Adam Rae, Vanessa Murdock, Adrian Popescu, and Hugues Bouchard. 2012. Mining the Web for Points of Interest. In The 35th International ACM SIGIR conference on research and development in Information Retrieval. 711–720.
- Ratinov and Roth (2009) Lev-Arie Ratinov and Dan Roth. 2009. Design Challenges and Misconceptions in Named Entity Recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning. 147–155.
- Revaud et al. (2012) Jérôme Revaud, Matthijs Douze, and Cordelia Schmid. 2012. Correlation-based Burstiness for Logo Retrieval. In Proceedings of the 20th ACM Multimedia Conference. 965–968.
- Revaud et al. (2019) Jérôme Revaud, Minhyeok Heo, Rafael S. Rezende, Chanmi You, and Seong-Gyun Jeong. 2019. Did It Change? Learning to Detect Point-Of-Interest Changes for Proactive Map Updates. In
- Romberg et al. (2011) Stefan Romberg, Lluis Garcia Pueyo, Rainer Lienhart, and Roelof van Zwol. 2011. Scalable Logo Recognition in Real-world Images. In Proceedings of the 1st International Conference on Multimedia Retrieval. 25.
- Ruta et al. (2012) Michele Ruta, Floriano Scioscia, Saverio Ieva, Giuseppe Loseto, and Eugenio Di Sciascio. 2012. Semantic Annotation of OpenStreetMap Points of Interest for Mobile Discovery and Navigation. In 2012 IEEE First International Conference on Mobile Services. 33–39.
- Schlichtkrull et al. (2018) Michael Sejr Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling Relational Data with Graph Convolutional Networks. In The Semantic Web - 15th International Conference (Lecture Notes in Computer Science), Vol. 10843. 593–607.
- Su et al. (2017a) Hang Su, Shaogang Gong, and Xiatian Zhu. 2017a. WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web. In 2017 IEEE International Conference on Computer Vision Workshops. 270–279.
- Su et al. (2017b) Hang Su, Xiatian Zhu, and Shaogang Gong. 2017b. Deep Learning Logo Detection with Data Expansion by Synthesising Context. In 2017 IEEE Winter Conference on Applications of Computer Vision. 530–539.
- Sun et al. (2019) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced Representation through Knowledge Integration. ArXiv abs/1904.09223 (2019).
- Wu et al. (2019) Fangzhao Wu, Junxin Liu, Chuhan Wu, Yongfeng Huang, and Xing Xie. 2019. Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation. In The World Wide Web Conference. 3342–3348.
- Xu et al. (2019) Canwen Xu, Jing Li, Xiangyang Luo, Jiaxin Pei, Chenliang Li, and Donghong Ji. 2019. DLocRL: A Deep Learning Pipeline for Fine-Grained Location Recognition and Linking in Tweets. In The World Wide Web Conference. 3391–3397.
- Zhou et al. (2013) Meng Zhou, Ming Wang, and Qingwu Hu. 2013. A POI Data Update Approach based on Weibo Check-in Data. In 21st International Conference on Geoinformatics. 1–4.