Semantically Driven Auto-completion

06/22/2019 ∙ by Konstantine Arkoudas, et al. ∙ Bloomberg 0

The Bloomberg Terminal has been a leading source of financial data and analytics for over 30 years. Through its thousands of functions, the Terminal allows its users to query and run analytics over a large array of data sources, including structured, semi-structured, and unstructured data; as well as plot charts, set up event-driven alerts and triggers, create interactive maps, exchange information via instant and email-style messages, and so on. To improve user experience, we have been building question answering systems that can understand a wide range of natural language constructions for various domains that are of fundamental interest to our users. Such natural language interfaces, while exceedingly helpful to users, introduce a number of usability challenges of their own. We tackle some of these challenges through auto-completion for query formulation. A distinguishing mark of our auto-complete systems is that they are based on and guided by corresponding semantic parsing systems. We describe the auto-complete problem as it arises in this setting, the novel algorithms that we use to solve it, and report on the quality of the results and the efficiency of our approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Bloomberg Professional Service, popularly known as the Terminal, has been a leading source of financial data, analytics, and insights for over 30 years. Customers use it to query a wide variety of structured, semi-structured, and unstructured sources, create alerts, plot charts, draw maps, compute statistics, etc. For most of its history, queries to the terminal have been built via dedicated GUIs. For example, if users wanted to find bonds that satisfied certain criteria, they would first need to navigate to a bond-search function, and then specify the conditions of interest by interacting with a variety of GUI widgets. Long-time power users of the Terminal are typically comfortable with their usual workflows. However, the large number of available functions and the complex GUI interactions they require may present challenges to those who need to step outside their usual workflows, and can impose a learning curve on newcomers [12, 17].

To mitigate these challenges, we have undertaken work aimed at allowing users to interact with the Terminal in natural language, and specifically to formulate queries directly in natural language. These range from simple factoid questions to structurally complex queries. The following are representative examples from a number of different domains:

  • What are the top 5 European auto companies with eps at least 3?

  • What was Apple’s market cap in the second quarter?

  • Show me investment grade bonds in the emerging markets with yield 4%

  • News about brexit from the New York Times between September and now

  • Chart a histogram of Netflix vs AT&T cable subscribers over the last 5 years

  • Tech CEOS in California who graduated from Harvard Business School

Our QA (question answering [3, 20, 25]) systems use semantic parsing to compute a formal representation of the meaning of such a query. These representations are then translated into executable query languages (such as SQL or SPARQL). Those queries are finally executed against the back end and the results are presented to the user.

However, natural language interfaces present usability challenges of their own [9]. In short, it is not clear to users what a QA system can and cannot do. The first part pertains to discovery, and specifically to discovering what the system can do—what class of questions or commands it can understand. The second part pertains to expectation management: We want to steer the users away from the (inevitable) limitations of the QA system. Such limitations include lack of support for specific kinds of functionality, incompleteness of the underlying data, and limitations of semantic parsing technology. We use auto-completion as a tool that can help to tackle both the discovery problem, by suggesting queries which we know to be fully parsable and answerable; and expectation management, whereby we stop offering suggestions as a signal indicating that we are not able to understand and/or answer what is being typed. Figure LABEL:Fig:SampleNewsQueries showcases some inputs and outputs from our auto-complete system for news (a domain we will discuss further in the sequel).

Building AC (auto-complete) systems for new QA systems introduces a set of unique challenges, the main one being the cold-start problem. Since AC systems aim to address fundamental usability issues with QA systems, we aim to release QA and AC systems in tandem. This means that we don’t have the luxury of large query logs that can be used to bootstrap the AC systems. However, on the positive side, because we have access to the grammatic structure encoded in the semantic parser, it is possible, with the aid of appropriate lexicons and certain statistics, to generate large sets of queries synthetically, which can then be used as if they were user queries. Synthetically generated queries will never be as good as the genuine article, but when carefully prepared they can be very helpful.

In this paper we report on our experience building AC systems for natural language interfaces. Specifically, the paper makes the following contributions:

  • we introduce the problem of auto-completion for QA systems that are based on semantic parsing, and identify a set of properties that systems tackling this problem should satisfy (Section 3);

  • we outline our approach and a number of algorithms that we use to tackle this problem (Section 4);

  • we report experimental results on the effectiveness and efficiency of the AC systems we have been building at Bloomberg (Section 5).

The following section provides some brief background on semantic parsing (Section 2), and the last section discusses related work (Section 6).

2 Background: Semantic Parsing

= “chinese non-tech bonds maturing in three years”

= (COUNTRY_OF_RISK = CHINA) AND NOT(SECTOR = SEC_TECH) AND MATURITY_DATE = RELATIVE_TIME(3,YEAR,NOW)

=

[ [ (COUNTRY_OF_RISK = CHINA) [CHINA, tier=preterminal [chinese, tier=terminal] ] ] [ NOT(SECTOR = SEC_TECH) [ [NOT, tier=preterminal [non, tier=terminal] ] [(SECTOR = SEC_TECH) [SEC_TECH, tier=preterminal [tech, tier=terminal] ] ] ] ] [ (MATURITY_DATE = (RELATIVE_TIME(3,YEAR,NOW)) [MATURITY_DATE, tier=preterminal [maturing, tier=terminal] [in, tier=terminal] ] [3, tier=preterminal [three, tier=terminal] ] [YEAR, tier=preterminal [years, tier=terminal] ] ] ]
Figure 2: A bonds query with its interpretation and corresponding derivation.

Semantic parsers map natural language utterances into logical forms that capture their meaning [14]. This is done in two conceptual stages. The first is a parsing analysis, whereby a sentence is mapped to all interpretations that can be derived from it, reflecting the lexical and syntactic ambiguity of natural language. The second is a ranking stage aimed at ordering these interpretations in accordance with their plausibility. The top interpretation is then executed by the back end in order to return results, plot a chart, set up an alert, etc.

We provide QA systems based on semantic parsing for multiple domains, each of which might have its own domain-specific executable query language in the back end, reflecting differences in the underlying data models and supported operations. It is not feasible to tailor each QA system to each desired target query language. Therefore, we use a generic intermediate representation language (IR) based on a fragment of first-order logic. Our semantic parsers map natural language to this IR, and IR formulas are then readily translated to whatever executable query language is used by the domain’s back end.

An IR formula is typically either an atomic formula (or atom for short); or else a complex sentential combination of formulas, namely, negations, conjunctions, or disjunctions. Atoms are usually equalities between variables (or fields, also known as attributes) and values, where a variable has a primitive type, usually numeric (integer or real); or an enumeration (enum for short), in which case the value must be one of the finitely many values of that variable; or string; or boolean. Examples of numeric fields are stock price and salary; examples of enumeration fields include credit ratings and country of domicile; and examples of boolean fields are actively traded (for tickers), convertible (for bonds), privately held (for companies), etc. An atom might also be an inequality such as or , when is a numeric field and is a numeric value (possibly with units or other additional information attached). In the general case, an atom can be any -ary relation between values of certain types, for . Here we will be mostly concerned with formulas that are conjunctions of one or more atoms: where . Informally, a derivation is a tree whose terminal nodes are the tokens in and whose non-terminal nodes correspond to logical and non-logical symbols and formulas. A derivation is shown in Figure 2.

We do not have space here to say much about our semantic parsing technology, but it is important to note that our discussion in this paper is agnostic on that point. The semantic parsers could be based on CCGs and machine learning, or on PCFGs and first-order or higher-order logic (with or without machine learning), or on parser combinators, or even on a purely deep learning pipeline. The only requirement is that there should be some notion of a structured (tree-based) meaning representation.

3 Problem Statement

We now outline a set of properties that should be satisfied by AC systems designed to improve the usability of semantics-based QA technology. The problem we address in this paper is building AC systems that satisfy these properties. As the QA and corresponding AC systems should ideally be released together, a major challenge that we deal with is satisfying these properties despite the scarcity of data in the form of query logs resulting from cold starts.

Two minimal requirements that characterize the AC problem in our setting are soundness and completeness. Soundness itself is split into two properties, syntactic and semantic soundness. An AC system is syntactically sound provided that every completion that it returns for a given partial query is a syntactic extension of , meaning that the tokens of can be partitioned into two sets and , where every is a suffix of a unique token in , and every in is a prefix of a unique . It is semantically sound if is semantically parsable and answerable. Both properties are conditional statements and thus would be easy to attain if the system never provided any completions. We also need completeness: The system should provide at least some completions whenever is in fact extensible to some semantically parsable and answerable query.

But we need a number of additional properties above and beyond soundness and completeness:

  1. The completions should be predictive of user intent; in particular, the user’s intended query should be as high up on the list of completions as possible.

  2. The completion list should be diverse: It should contain entries of different types. In the case of a QA system for news, for example, if the partial query is the letter , the results should not be limited to companies whose names start with an i, such as IBM; it should include people (such as Icahn), sectors (such as insurance), regions (Ireland), and so on. Diversity is very important for discoverability.

  3. The completions should be propositional, meaning that they should have full sentential semantics: The semantic parser must fully map the completion to a formula in the underlying logic, which could be a sentential atom or a more complex formula. For example, if the partial query is investment grade bonds i, then investment grade bonds in the emerging markets is an acceptable propositional completion, but investment grade bonds in the is not. All completions in Figure LABEL:Fig:SampleNewsQueries are propositional.

  4. The completions should be as grammatical as possible, modulo what the user has already typed. The QA system should be able to understand telegraphically formulated queries [13], but nevertheless we should strive to offer completions that are as linguistically well-formed as possible. There is tension between this requirement and completeness, which is why we formulate this as a soft constraint.

The above properties are our main focus, but there are other desiderata as well, such as personalization (the history and profile of a user should affect the completions they receive) and popularity (popular queries posed by other users should be more likely to appear as completions) [26].

4 Approach

We now outline the high-level approach we take to solve the auto-completion problem introduced and motivated above. The approach relies on a number of different completion algorithms, each of which takes a prefix string provided by the user ( in the notation of Section 3), potentially along with additional domain-dependent configuration parameters, and returns an ordered sequence of Completion objects, each of which contains at least four pieces of information:

  • completion: the completion string to be shown to the user ( in Section 3).

  • interpretation: the interpretation (semantics AST) of the completion. The interpretation is used for deduplicating completions. In some instances, a simplified form of the interpretation may be shown to the user to help disambiguate the meaning of the completion, or introduce the user to a domain’s vocabulary.

  • type: one of a finite number of identifiers designating the different types of completions, used for maximizing diversity.

  • grade: a qualitative score comparable across different completion algorithms and used for weaving and ranking the final set of completions.

When an AC system receives an input prefix, it passes it to a top-level coordinating algorithm that runs a number of available completion algorithms in parallel (we describe the main algorithms in the following sections). The coordinating algorithm will wait until all algorithms return their completions. Each individual algorithm is expected to return a diversified ranked list of completions. The coordinating algorithm then takes these lists and weaves them in a way that again ensures diversity and respects the grades. The top-level algorithm will also ensure that semantic and lexical duplicates are eliminated, though the individual (lower-level) algorithms may also perform their own deduplication. Note that the availability of semantics allows us to detect duplicate completions in a much stronger sense than would be allowed by simple morphology. For instance, ibm and big blue will be conflated assuming that their semantic representation is identical (the ticker IBM). Coordinating algorithms can be customized on a per-domain basis. For example, in some domains it might make sense to run some of the algorithms only if all other algorithms fail to return results. In other domains, completions with low grades may be eliminated if there are completions with high or medium grades. We now proceed to describe the main completion algorithms that are typically used in most domains.

4.1 Most Popular Completion (mpc)

A natural starting point for auto-completion is utilizing available query logs (following a number of careful steps intended to safeguard user privacy). As discussed, in our setting we need to deploy a QA and a corresponding AC system in tandem, which means we don’t have any user query logs initially. However, we can often use queries collected internally or from annotators to provide a core initial set of queries for log-based AC, and we may also generate queries synthetically.

For fast matching against a user’s partial query , this algorithm uses standard mpc (“most popular completion”) implementation techniques, albeit augmented to address aspects of the problem that are peculiar to our setting, as discussed in Section 3. The query log is stored in a trie where each query is paired with its frequency. We are typically interested in the top- matches with respect to the log frequency, where is typically around 50. These queries are then re-ranked using a domain-specific re-ranker.

As discussed earlier, each completion needs to be annotated with its interpretation, type identifier (for diversification), and grade. All three pieces of information are pre-computed offline when query logs are ingested. Pre-computation allows us to keep the amount of computation done online to a minimum. Diversification information is computed offline for all atomic constraints in the interpretation of a query. Online, at completion time, the constraint whose type is used as the diversification identifier of the completion is determined dynamically based on the user’s prefix with respect to the full completion. Given a prefix and a completion , where is a prefix of and is the interpretation of with atomic constraints , the diversification class is a function of , the constraint with a path to the last token in in the derivation tree of . This simply means that diversification is performed based on the constraint that is currently being typed by the user. For example, the user prefix “chinese non-te” can be completed to “chinese non-tech bonds maturing in three years”, whereby the last token in is “te”. If we look at the derivation of the interpretation of in Figure 2, the completion of that token in the prefix corresponds to the SECTOR constraint, which means that SECTOR is used as the diversification type for this completion.

In some cases, queries from the logs cannot be suggested in their raw form, requiring some type of reformulation. For example, a query like “list all chinese tech bonds that will mature between April 1, 2018 and May 30, 2020” is meaningful as a completion up until April 1, 2018, but not after. The issue stems from the combination of tense and explicit time expressions. We will describe how we deal with this issue as an example of the general problem of query log reformulation. We deal with issues caused by explicit time expressions in the query log by a time-shifting reformulation. In this setting, we have the logged query observed at time that contains explicit references to times and the current time at which the time-shift reformulation is happening is . The shifted times are obtained as follows: . The result of performing this shift with and is “list all chinese tech bonds that will mature between April 1, 2021 and May 30, 2022”. The shift is based on the observation that users are typically interested in information related to a relative time in the future or past, but expressed using absolute times (rather than relative ones like “yesterday” or “in two months”).

4.2 Atomic Completions (atomic)

As described above, log-based completion suffers from a major shortcoming that results in underutilization of query logs. In particular, a completion provided by this algorithm must be a log query that contains all tokens in the user’s partial query, i.e., a log query that extends whatever the user has typed. This is an exceedingly strong condition. As a very simple example, suppose that our log contains only the following two queries:

ibm bonds maturing in 2020
bullet bonds with yield > 2 pct

And suppose now that the user types the partial query

bullet bonds mat (1)

The log-based algorithm that we have described is incapable of offering any completions for input (1), because there is no single query in the log that extends this input. The first query in our log does not contain bullet, while the second one does not contain any tokens that extend mat.

Note that this inability persists even if we loosen up the notion of matching used by the log-completion algorithm. For instance, we might not insist on left-to-right matching, so that an input like in 2020 might still be completed to the first query, ibm bonds maturing in 2020. But even with such restrictions lifted, the algorithm would be unable to complete (1), because the gist of the limitation is that no one single query in the log matches (1). The fundamental units returned by this algorithm are entire queries in the log, which is not just unduly limiting as just indicated, but is also often inappropriate. For example, if the user types ib, the log-based algorithm will return ibm bonds maturing in 2020 as a completion, which is rather unnecessarily long and specific. Arguably, a more appropriate completion might simply be ibm bonds. This is a more general completion (carries less information than ibm bonds maturing in 2020) while still satisfying the propositionality requirement.

The atomic completion algorithm described in this section addresses these issues not by completing to entire queries in the log, but rather to atoms found in the log, or more precisely, to atom surface forms.111To simplify presentation, we will continue to use the term “atom” to denote either a logical atom, i.e., a piece of semantics represented as an AST, or a phrase (a sequence of tokens) that has a logical atom as its semantics. The context will always disambiguate the use. Typically, each query in the log contains multiple atoms, so this might not only give us a larger pool of potential results to return as completions, but the results themselves are smaller, more general, and most importantly, they can be more flexibly stitched on at the end of user inputs. In our simple running example, there are four atoms total:

ibm bonds
maturing in 2020
bullet bonds
with yield > 2 pct

The top two derive from the first query in the logs, and the bottom two from the second query. These four atoms now become our major completion candidates. The atoms we extract from a query log are then organized into a trie , the atom trie. (We will have more to say on how atoms are extracted from query logs shortly.)

Then, online, when we are given a partial query to complete, where and is any sequence of characters that is a prefix of some token in the vocabulary,222We ignore spelling correction in this paper. the atomic algorithm proceeds as follows: First, we use the domain’s semantic parser to parse as much of as possible. This means that our semantic parser must tolerate disfluencies and noise, at least at the tail end of the input. This decomposition analysis splits the input into two parts:

  1. an initial segment that is understood by the semantic parser and results in some semantics , where might be equal to ; and

  2. the remainder of the input, , which constitutes an unrecognized segment.

Assuming that the remainder is non-empty, we match it against the atom trie , and this returns a list of atoms as potential completions. We then assign a score to each atom , relative to the initial segment . This score can be understood as a numeric measure of the goodness of the fit between and , i.e., the degree to which is an appropriate completion of the unrecognized segment given or conditioned upon the existence of to the left of . This is obtained by a scoring function that takes , , , and an atom model as inputs (to be described below). We then do a selection sort of , based on , and for the desired number of completions (typically 5 or 10). For each of those top atoms, its untokenized form is appended to the end of , and the result becomes the corresponding final atomic completion. The semantics of that final completion are typically obtained by conjoining with the (pre-cached) semantics of the atom .333In rare cases, if the first word of is a logical connective such as or, we might produce a disjunction instead of a conjunction.

As a quick illustration, suppose again that the input is bullet bonds mat. During the decomposition analysis, the semantic parser will recognize bullet bonds as the maximal initial segment that is parsable, with atomic semantics

MATURITY_TYPE = BULLET (2)

and will identify the segment mat as unrecognized. That segment will then be matched against the atom trie and will return the singleton list as the only candidate completion (recall that has only four atoms in the running example). So in this trivial example there is no ranking to be done, and by concatenating with this atom we obtain the one and only atomic completion:

Its semantics are given by the conjunction of (2) and the semantics of the atom itself, namely

MATURITY_DATE = ExactDate(-1,-1,2020) (3)

(A date expression of the form ExactDate(,,), for integers , , and , denotes the date corresponding to the respective coordinates, with a indicating that no value was specified for the corresponding dimension. An expression of the form Relative_Time(,,), for a temporal unit , a numeric value , and an anchor time expression , indicates the time obtained by adding units to the time denoted by .)

How well this algorithm works hinges on the quality of the similarity metric . To describe how is computed, we first need to discuss the contents and offline generation of the atom model . This model is essentially a map, computed offline and loaded upon initialization, from each atom to a record of information about , such as its count (the number of times it occurs in the corpus), its tokenized and untokenized representation, its semantics (encoded as an AST), and most importantly, its context vector space

. The context vector space of an atom

, denoted by , is defined as a lexical vector space (sparse map from vocabulary words to integer counts), obtained as follows:

  • Set .

  • For every query in the log:

    • For every occurrence of in :

      • Let be all and only the words in to the left of that occurrence of . For each such , set

While this algorithm is parameterized over a single atom , it is possible to build for all atoms at the same time with just one linear scan over all queries in the log.

To continue the running example, the model computed here would be of the following form (expressed in pseudo-JSON notation):

{"ibm bonds" :=  {
   semantics := "COMPANY_NAME = IBM",
   count := 1,
   context := {},
   ...
 },
 "maturing in 2020" := {
   semantics := "MATURITY_DATE = ExactDate(-1,-1,2020)",
   count := 1,
   context := {"ibm" := 1,
               "bonds" := 1},
   ...
 },
 "bullet bonds" := {
   semantics := "MATURITY_TYPE = BULLET",
   count := 1,
   context := {},
   ...
 },
 "with yield > 2 pct" := {
   semantics := "FLD_YLD > 2(PERCENT)"
   count := 1,
   context := {"bullet" := 1,
               "bonds"  := 1},
   ...
 }
}

That is, every atom contains its semantics, a context that encodes the vector space above, and potentially a wealth of additional information (such as linguistic features ranging from morphology to syntax and additional semantic signals), represented above by ellipses.

We can now outline the scoring function as follows:

  • If and are incompatible, penalize (in proportion to the degree of incompatibility). This is typically determined by additional statistics that are computed offline from the corpus and stored in the model .

  • Otherwise, compute the similarity by looping through the words in , using the context as a grader:

    • Set ;

    • For each :

    • Return , where is a scaling function (or the identity if no scaling is needed).

    The function may be the identity function or some other layer of processing on top of the raw counts. Accordingly, words in that have been seen before to the left of in the corpus are rewarded in proportion to how often they have been seen. Words in that have never been seen before to the left of may be accordingly penalized.

To simplify presentation, we stated earlier that the remainder is matched against the atom trie , and that this operation returns a list of atoms which are then ranked on the basis of the scoring function. The real picture is only marginally more complicated, in order to ensure semantic diversity. Specifically, the matching operation returns the list of trie matches partitioned into a set of buckets, where all atoms in the same bucket have the same type of atomic semantics. That type is usually determined by—and can be identified with—the field that occurs on the left-hand side of the operator op, assuming that the atom is of the form . For atoms of different form, some other unique type identifier must be specified as the atom’s “type.”

These types play a dually useful role. First and foremost, they allow us to diversify the results by ensuring that we don’t get atomic completions of one type only (or two types only), say only completions of the form . This is particularly important for short inputs, because with hundreds of thousands of companies in total, there will be thousands of company names completing any one-letter prefix, and it might well be that several of those will be popular. In general, we want to mix up the set of results to the greatest possible extent while still ensuring that the provided completions are plausible and are reasonably predictive of user intent. In our case this is accomplished by ranking each bucket separately and then weaving the resulting ranked lists.

The second major use of atom types is in avoiding completions of a type that has already been encountered in the initial segment . For instance, consider the partial query . Here is maturing in 2020, with semantics (3). As discussed, the type of that atom can be identified with the field MATURITY_DATE. The unrecognized segment is the single letter m, which will match very many atoms in , quite a few of them might be of the form maturing in . We do not want to give a completion of the form

maturing in 2020 maturing in 2023 (4)

or, even worse,

(5)

Much more appropriate completions might be

maturing in 2020 issued by microsoft

or

maturing in 2020 mining sector

and so on. While completions such as (4) and (5) are naturally likely to have lower scores due to the context analysis, it is much safer to weed them out of consideration altogether by realizing that the type of the completion atom is identical to the rightmost type of the initial segment (more precisely, the type of the rightmost atom in the initial segment).444Depending on the domain, there are some fields and some constructions for which this type of juxtaposition is sensible and should not result in the elimination of the corresponding atom. For instance, a construction like “french, german or italian bonds” necessitates the juxtaposition of three atoms of the same type, the first two of which are conjoined for prefixes prior to the “or” particle. These situations receive special treatment in our system on a configurable basis.

Note that the list of candidate atoms that is obtained by matching against is already sorted, by a statically known measure such as the popularity of each (which may be defined simply as the number of occurrences of in the corpus). This is important for the following reason: In a domain with a large log of queries (which may be user queries or synthetically generated), there may be millions of atoms in the model, and a short unrecognized segment (e.g., only one character long) may return hundreds of thousands of atoms as candidates, or potentially even millions. The similarity function is fairly computationally expensive, so having to select the top 10 or so atoms based on this function from a list of that length can be prohibitively expensive. A lot of work can be saved here by realizing that in such cases (very short prefixes that match a very large number of candidate atoms) we are dealing with an embarrassment of riches: If is already pre-sorted by sheer atom count, then we can simply take the top 100K or so atoms in the front of the list and drop the remainder. The most popular 100K atoms are guaranteed to give us the results we want for that kind of input: more than enough popular atoms and with more than enough semantic diversity. Atoms in the tail of the list are much less likely as completions at that point; if needed, they will surface subsequently as the user types additional characters.

Occasionally, the initial split produced by the decomposition analysis is not optimal. In particular, the initial segment may be overly long, and to get the right split we need to backtrack, by shifting one or more tokens from the tail end of to the front end of . This might happen when is initially empty (and thus is identical to ), or it might happen when both and are nonempty. As an example of the first case, consider a partial query like ibm b. Because is a credit rating and our semantic parser is by necessity flexible in order to understand queries that are telegraphically or elliptically expressed, this entire query is fully parsed as

which would mean that there is no unrecognized segment to match against and therefore no completions offered by the atomic algorithm. A better decomposition analysis, however, is to treat the entire partial query as unrecognized, setting to the empty sequence. This would result in a wealth of atomic completions like ibm bullet, ibm bonds, ibm b or better

, and so on. We have heuristics in place for determining when and how far to backtrack in such a case. In the second case, when both

and are nonempty, a simple but effective heuristic for evaluating the goodness of the split is the number of results returned by the trie match. If we get a small number of results from the initial split, but a much larger number when we backtrack by a certain number of tokens, then the latter split should be preferred. Note that backtracking cannot proceed on a token-by-token basis, because we need to maintain the invariant that is fully recognized and its semantics consist of a number of constraints. Therefore, on each backtracking attempt, we need to backtrack by the exact number of tokens that correspond to the rightmost constraint left in at that point. Ideally, of course, we would consider all possible decompositions, obtain completions for each of them, and then merge in accordance with the scores. However, such an approach would be time-inefficient. Our results indicate that the present approach of picking one single decomposition to work with, but using informed heuristics in its selection, provides high-quality results and is efficient.

We close this section by pointing out that while the notion of an atom is typically fixed by the semantics of the domain at hand, with sufficient imagination it is possible to extend that notion to include altogether different types of atoms, thereby rendering the atomic completion algorithm applicable in novel ways. Indeed, it is possible to have multiple instances of the atomic algorithm running in parallel (see Section 4), one that is based on the conventional notion of atom, as determined by the usual semantics of the domain at hand, and others based on more alternative conceptions of atoms. As an example of the latter, consider autocomplete for news searches. News queries in Bloomberg are based on (a) a closed ontology of topics (such as oil, brexit, inflation, elections, etc.), tickers (unique identifiers of companies), persons, and wires (news sources); and (b) arbitrary keywords (free text). News queries may also specify time periods of interest, in natural language. For instance, a query like news about oil prices from the Financial Times last month must be understood (by the news semantic parser) as the conjunction of TOPIC:OIL, KEYWORDS:‘‘prices’’, WIRE:FT, and time=TimePeriod(m1/d1/y1 -- m2/d2/y2), where the time period is whatever corresponds to last month. A good source of completions for news queries are news headlines, and more specifically, noun phrases that occur in such headlines. These noun phrases, which can be extracted with any standard NLP tool, can be viewed as a sort of atom. If one views the headlines as a query log and the noun phrases as atoms, then one can extract an atom model as discussed above, with entries like:

"china tariffs" :=  {
  semantics := "TOPIC:CHINA & TOPIC:TARIFF",
  count := 235,
  context := {"trump": "83",
              "u.s.": "72",
              "wto": "9",
              ...
             },
  ...
 }

Then, given a partial query like trump c, the decomposition analysis using the news semantic parser would understand as a person entity in our ontology, leaving as an unrecognized segment. Matching against the atom trie would pick up china tariffs as a candidate completion, and then the similarity metric we outlined above would reward trump china tariffs with a high score. Our news autocomplete system includes an instance of the atomic algorithm based on this approach.

4.3 Template Completions (template)

The mpc and atomic completion schemes above both rely on access to query collections, either from users or artificially synthesized. Even in a best-case scenario where the QA system has been deployed for a reasonably long time resulting in a large query log, it is practically impossible to observe the entire vocabulary of a given domain in the log. This holds for both the semantic vocabulary (e.g. COMPANY_IBM) and the natural language one (e.g. “IBM”, “international business machines“, “big blue”). The requirement that an AC system should be complete (Section 3) suggests that we also need a way to generate completions based on what the QA system can understand, regardless of whether it has been asked before. This is where template-based completions come into play.

Templates are an interpretable and controllable way of performing natural language generation 

[27]. A template provides a schema for a natural language utterance, which can be instantiated at completion time using suitable lexicons. Templates for natural language generation have been used before in the context of verbalizing database queries [16, 23]. Our setting is different in that we are not only verbalizing a logical form, but we also have to generate the underlying semantics first with only a prefix to work with. Figure 3 shows an example of a template for completion in the equities domain.

Generating fluent, well-formed natural language is an open research problem that gets harder as utterances become longer and more compositional due to issues like agreement (on number, gender, tense) and coherence. We have to work in this difficult setting, as our QA systems support complex and highly compositional questions, which means that we need our AC systems to support the same. In fact, we want to use AC to encourage users to formulate such questions in order to utilize the full power of the Bloomberg Terminal. There are two ways in which we can use templates for completion. The first is to use atomic templates that capture atoms, which are matched in a fashion similar to atomic completions (Section 4.2), but against a template rather than a trie of atoms. The second approach is to use full-query templates where a template specifies how complete multi-atom queries are formulated. Multi-atom completions generated from atomic templates would need to rely on some form of scoring to handle fluency issues such as well-formedness, agreement and coherence. However, in contrast to the atomic completion algorithm of Section 4.2, which is based on data gleaned from logs, most template instances have very likely never been observed in any logs. Additionally, even if we could somehow perform the scoring, we would need to score an impractically large number of candidates that are only materialized at query time, which can have an unacceptable performance overhead. For these reasons we rely on full-query templates for template-based completions. While crafting these is somewhat more time-consuming than atomic templates, we have found that the speed and completion quality justifies the additional complexity.

In principle, a template can be instantiated offline and the resulting queries can be added to the query log to be used by the mpc and atomic completion algorithms outlined above. However, the desire to complete compositional multi-atom queries means that we would run into a combinatorial explosion if we consider a domain’s semantic vocabulary and the corresponding lexicalizations. Because of this, we resort to instantiating templates online at completion time in response to a user’s input. Doing so gives us the additional benefit of being able to meaningfully deal with completions for infinite sets like numbers and dates as we detail below.

Figure 3 shows a fragment of a full-query template template used for completing queries related to equities. Templates are encoded using the same formalism we use to encode the grammar used for semantic parsing in our QA systems, a cross between an algebraic formalization of recursive augmented transition networks (ATNs) [29] and parser combinators [11], which allows us to reuse a great deal of the infrastructure already in place for query understanding. Language generation in templates is primarily grounded in lexicons . The atomic template , for example, utilizes two lexicon lookups. One is for verb phrase aliases for fields in the present tense (such as “trade in”) and the other is for noun aliases of values for these fields (e.g. “nyse”, “china”, but not “chinese”). Lexicons are stored as tries for fast prefix matching. The two lookups are connected by compatible-value-constraint(), a special construct that enforces type compatibility between the fields resulting from a lookup against the first trie and values from the second (‘trade in’, for example, can be paired with the exchange ‘nyse’ but not the rating ‘A+’). To speed trie lookups for a subset of values that are type-compatible with the field, compatible-value-constraint() automatically constructs sub-lexicons of , one per semantic type, at initialization time (i.e. for locations, exchanges, ratings, etc.). Atomic templates are terminated with the mark construct, used to mark the boundaries of atoms. This allows the completion algorithm to complete atom-by-atom to the next full atom as shown in Figure LABEL:Fig:SampleNewsQueries.

Lexicon Derivation. The two lexicons used for AC in the atomic template are derived from larger lexicons used by the corresponding QA system: and , respectively. Such derived lexicons are typically logical views of the original ones and are not materialized, but are instead derived dynamically when the AC system is launched. This ensures that the QA and AC data remain in sync. It also minimizes the amount of data that has to be maintained, allowing updates to the QA system to be directly reflected in the corresponding AC system.

1in

[0.7ex]1pt     Primitive Templates     [0.7ex]1pt ¡logical-connectives¿ ::= and — , — or

¡firms¿ ::= firms — companies — equities

¡numeric-pattern¿ ::= ( — = — ¿ — …— greater than — …) completable(parser = numeric-parser, sub = )

[0.7ex]1pt     Atomic Templates     [0.7ex]1pt ¡enum-present¿ ::=                                trade in nyse

¡numeric-atom¿ ::=                       market cap 2… usd

[0.7ex]1pt     Full-query Templates     [0.7ex]1pt

¡adjectives¿ ::=                                                 german tech

¡display-fields¿ ::=                   ipo date, ipo price and fitch rating

¡selection-query¿ ::= (¡adjectives¿ ¡firms¿ with kleene-plus-with-separator(¡numeric-atom¿, ¡logical-connectives¿)) — …
                                                                                 german tech companies with market cap 2… usd

¡projection-query¿ ::= () — …
                                                                 ipo date, ipo price and fitch rating of equities that trade in nyse

¡query¿ ::= ¡selection-query¿ — ¡projection-query¿ — …

Figure 3: Example of a full query template. The root of the template is query.

Completing quantities with templates. The use of templates that are instantiated online and have full access to the expressiveness of our semantic parsing formalism results in a flexible AC framework. We will demonstrate this by describing how we use templates to complete infinite sets like numbers. The problem we are tackling here is the following: As the user is typing, it is easy to complete fields (e.g., ‘listed on’, ‘market cap’, ‘maturity date’) and entities (e.g., ‘nyse’, ‘bill gates’, ‘siemens’), as these come from finite sets, but how do we go about completing numbers, of which there are infinitely many? One solution can be to hardcode a few templates with some numbers, for example:




                             

The above template can complete a prefix like “market cap 1” to “market cap 1,000,000 usd”, but would fail on the prefix “market cap 2”. This failure violates the completeness requirement. The lack of completions might (incorrectly, but understandably) give the user the impression that their query cannot be extended to something the system can understand. This shortcoming when dealing with numbers is particularly unacceptable in the financial domain. While it is generally impossible to anticipate the exact number the user will type, we are mostly interested in exploiting AC to communicate to the user that their input is a prefix of a query that we can potentially understand. A completion of the form “market cap 2… usd” for the input “market cap 2”, where the number is completed by ellipses, would communicate to the user that (i) our system understands that they are typing a number, and (ii) that the value being typed is for the market cap field, where usd is an appropriate unit.

We address this problem using the completable construct (see the Completability algorithm in Section 4.4 below). This construct does the following: It it is initialized with a parser (the numeric-parser in the template of Figure 3) and substitution string (the ellipses “…” in the same example). At completion time, when this construct is passed a string , it checks which one of the following cases holds:

  1. a prefix of can be parsed by ,

  2. is a prefix of a string that can be parsed by , or

  3. none of the above, indicating that no prefix of can be understood by the provided parser, and therefore this template cannot complete the given user input.

The following examples demonstrate what happens in each of the respective three cases for three different prefixes:

  1. “market cap 2” “market cap 2… usd”

  2. “market cap 2M u” “market cap 2M usd”

  3. “market cap ibm’s market c” template failure.

Matching starts against the atomic template . In all three cases “market cap” is matched by , with the remaining text processed by the template. Here, “” is a literal match against the first part of that template, and control moves to the completable construct. We are particularly interested in the first case (i). Here “2” is passed to the completable construct, and “2” is a prefix of a string that can be parsed by the numeric-parser. In this case, the completable construct produces a completion with ellipses (…) to convey to the user that the system is aware that they are currently typing a number. The completion goes past the number to a currency unit (“usd”) that is compatible with (“market cap”) in the prefix, indicating to the user it understands that the number being typed is for a market capitalization. The unit comes from back in . In (ii), “2M u”, with the completable construct returning that the prefix “2M” can be parsed as a number, and the remaining suffix “u” is passed to the subsequent template fragment, . In (iii), the construct returns failure by the numeric-parser to cope with any prefix of “ibm’s market c”.

4.4 Completability Analysis

For various reasons, the above algorithms might fail to produce completions for a user’s input. This is often by design. As explained in Section 3, our QA systems are capable of understanding very elliptically phrased questions and commands that are, strictly speaking, ungrammatical. We need to understand such language because it is common, as users have become conditioned to interacting with Web search engines and other query interfaces using free-style keyword queries. However, producing such language in completions is typically undesirable, as (i) the interpretation of such phrases may not be obvious to a user reading the completion, especially to new users; and (ii) we want to use the AC system to steer our users towards more expressive and well-structured questions.

When none of the previously introduced algorithms return completions, we still want to use the AC system to inform the user whether their input is extensible to a query that the system can understand. We achieve this by a special algorithm that essentially checks whether the underlying grammar used for semantic parsing can be used to parse the given prefix and go just a bit beyond it, not necessarily all the way to an atom but simply to a complete phrase (according to the domain’s vocabulary). No extension of the user’s prefix is performed here, it is simply a completability analysis. The result is communicated back to the user through the UI, informing them whether it pays off to continue with the provided prefix. If the check fails, a user will typically go back a few characters to a point where the system had indicated that the input is completable, and reformulate their query from there. A completability check will typically fail because the user is about to refer to fields, entities, or functionality not available in a given domain. In Section 4.3 we showed how completability analysis is used to meaningfully complete inputs involving infinite sets such as numbers.

5 Experimental Results

In this section we report on quantitative experiments intended to evaluate our approach to auto-completion in domains with different characteristics. Our experiments focus on predictiveness and efficiency. Some of the other desiderata mentioned in Section 3 are guaranteed by the manner in which we compute completions: soundness, diversity, and propositionality. Others, like grammaticality and completeness (which needs to be restricted by grammaticality), are better evaluated qualitatively through user studies. We omit such qualitative results.

5.1 Experimental Setup

We present experiments in the bonds (BNDS) and news (NEWS) domains. The two domains are markedly different, which motivates the use of different completion algorithms in each, and allows us to observe different patterns in the results. In the BNDS domain, queries are issued against a large but relatively static database (terminologically). The information needs are typically complex, as reflected by long multi-atom queries, such as:

  • European non-gbp high yield bonds maturing between 2020 and 2030; and

  • USD hybrid subordinate bonds issued in the last 6 months.

By contrast, NEWS queries are issued against news documents with various annotations, such as publisher, publication date, named entities within an article (such as companies or persons), etc. A NEWS query is mapped by our semantic parser to a combination of structured conditions matched against these annotations and keyword conditions matched against the body of a document (see p. 4.2). NEWS queries are typically shorter than BNDS queries. Examples of NEWS queries include:

  • “Negative news about Guaido”
    CONTAINS:PERSON_JUAN_GUAIDO AND
    SEMTIMENT:NEGATIVE

  • “World’s longest flight news from nyt”
    KEYWORDS:‘‘world’s longest flight’’ AND
    SOURCE:NY_TIMES

The distinguishing feature of NEWS is its dynamism. New people, topics, and keyword phrases emerge in the news every day, and the interests of users change accordingly.

The above characteristics motivate the algorithms used in each domain. Because of the relatively stable state of BNDS data, the mpc and atomic algorithms trained on reasonably sized query sets are generally sufficient. To accommodate the dynamism of NEWS, the AC system needs to capture any semantic entities and topics that the QA system recognizes. It also needs to be aware of the latest unstructured topics (e.g., “world’s longest flight”) that emerge in the news, resulting in sudden interest from users. The first requirement is tackled by using the template algorithm, which allows the AC system to stay in sync with any entities known to the QA system, as well as the atomic-log algorithm, which generates atomic-chunk completions extracted offline from queries. The second requirement, related to unstructured topics, is tackled by using an instance of the atomic algorithm, atomic-headline, that is trained on noun phrases extracted from news documents (p. 4.2).

Our experimental setup is intended to test how robustly predictive our AC systems are. To do so, we simulate users interacting with our systems to ask questions that have never been asked before. We do so by taking an existing set of queries from a time interval , specifying a a cutoff time in that interval, and partitioning into two sets and , respectively comprising the queries that appear in and . will be used to train those algorithms that need training as described below. Now, to simulate users asking questions that the system has not seen before, we use as our set of test queries (where is the set difference operator). Because training and testing queries are disjoint, mpc is effectively useless and is taken out of consideration for both BNDS and NEWS. For BNDS, this leaves the atomic algorithm trained on . For NEWS, is used to train atomic-log. Additionally, atomic-headline is trained on news headlines from articles published up until , and template uses the data available to the corresponding NEWS QA system.

Following previous work, we restrict ourselves to prefixes of length at least 3 characters, when a word starts to emerge, as the completion of very short prefixes like ‘c’ and ‘ci’ is somewhat ill-defined. The AC systems are configured to return 10 completions for a given prefix. Individual algorithms will typically generate many more completion candidates, allowing the top-level coordinating algorithm (Section 4) to choose the best 10 (accounting for grades, the need for diversification, and so on). Completion algorithms run concurrently, and pass their results to the top-level algorithm to be merged.

5.2 Predictiveness

A predictive AC system is one that can suggest a user’s intended query closer to the top of the completion list. In this section we present experiments that demonstrate the predictive abilities of our AC approach.

Predictiveness Measures. We evaluate predictiveness using the standard approach in the literature: For each test query (where each is a character), we generate its prefixes, each one of the form , . For each prefix, we would like a completion that matches , in some appropriate sense, to appear as high as possible in the completion list produced by the AC system. Let be the ordered list of completions returned by the AC system, and let be a predicate that returns 1 if and have the same intent and 0 otherwise. We will present several possible instantiations of this predicate below. The reciprocal rank (RR) is defined as follows:

For an evaluation query collection , we report the Mean Reciprocal Rank (MRR), which is the average of all RR scores over all prefixes of all queries.

BNDS NEWS
MRR PARTIAL MRR PARTIAL
instantiation MRR MRR
STR 0.028 0.374 0.226 0.355
BOW 0.031 0.442 0.243 0.457
SEM 0.081 0.589 0.256 0.491
Table 1: Predictiveness

We now present various instantiations of the predicate above. The simplest one, STR, is when we look for exact string matches, i.e., . A more appropriate measure for settings like ours, where queries can be long, has been introduced by Park and Chiba [24], and is known as the partial match criterion (PSTR). Here the completion can be the same as or a prefix of it. Partial matching is an important notion in our setting due to the propositionality of our completions (Section 3). Outside of mpc, our core AC algorithms are designed to complete to the next atom, so for a prefix like european non-gbp h originating from the reference query european non-gbp high yield bonds maturing between 2020 and 2030, our systems would generate european non-gbp high yield bonds, which is a match under PSTR but not under STR.

Next, if we look at the definition of syntactic soundness in Section 3, a valid completion can reorder the individual tokens in the user’s input prefix. For example, a valid completion for “guai” is “juan guaido”. This easily extends to longer multi-atom questions. To capture this, we generalize the STR matching to bag-of-word (BOW) matching, where a match occurs if the set of words in and the completion are the same. This can also be relaxed in the same way that PSTR is a relaxation of STR, through the notion of the bag-of-word subset (PBOW).

Finally, given that we are in a setting where we have access to both completions and their semantic interpretations, we can perform matching based on the semantics rather than the surface form of the completion, resulting in the SEM and PSEM matching predicates. For example, consider an input query ibm in the BNDS domain. The semantics of this query are captured by a formula along the lines of

(6)

Given a prefix like ib, the completion ibm bonds is, strictly speaking, neither a prefix of the intended query (ibm), nor the same bag of words. Accordingly, neither of the partial measures described above would count this completion as successful. However, the semantics of the completion are in fact identical to the semantics of the intended query, as given by (6), and by that important measure, the completion is in fact a perfect match, even if its lexical form is different.

Table 1 shows the results of the predictiveness experiment. It is important to keep in mind that in our setting we are completing to queries that have never been observed before (by restricting evaluation to ). With this in mind, we start by analyzing the results for BNDS. The contrast between (full match) MRR and partial match MRR is striking. The low MRR numbers are expected given the highly compositional nature of BNDS queries and the fact that the algorithm we are testing is designed to complete to the next full atom, which makes a complete match only possible when the prefix contains characters for the very last atom in the reference query. If we focus on the partial MRR results for BNDS, we can observe how, as we move down the table, the numbers increase, reflecting the fact that our AC systems are semantically driven, which means that even if they do not reproduce the exact same reference query, they produce a semantic equivalent with possible reordering of words. The semantic (SEM) partial MRR reflects that the desired completion appears, on average, at rank 1 or 2.

The MRR numbers for NEWS reflect the contrast between query length and the degree of query compositionality compared to BNDS. As in BNDS, completion in NEWS is done to the next atom. However, since the queries are shorter, exact matches are easier to come across. The partial MRR numbers for news tell the complete picture about the quality of the results. They are in line with the numbers observed for BNDS, with the desired completion appearing at rank 2 on average, which is a strong result given that none of the target queries has been seen in its entirety by the system before.

5.3 Efficiency

AC systems need to be highly responsive: We impose a hard upper bound of 100ms between a user’s keystroke and the presentation of the corresponding completions on the screen, in order to ensure interactivity and avoid the user noticing any lag [21]. This time span needs to include not only the time it takes to compute completions, but also the time needed to transmit them over a network and paint them on the UI. We have been targeting response times of well below 100ms, and our experiments demonstrate that we consistently achieve them.

We report on the mean completion time, as well as the th percentile for several values of . Table 2 shows the results we obtain. All numbers indicate that our AC systems are fast, even when considering the 99th percentile, we are well below the 100ms maximum allotted for the end-to-end completion of a prefix.

System Mean
NEWS 11.37 22.21 27.46 42.51
BNDS 6.17 9.31 14.35 45.93
Table 2: Response times in ms.

The template algorithm is generally the most time-consuming algorithm, the reason being that, compared to other algorithms, template generates its completions exclusively online, which means that it cannot cache metadata for these completions like other algorithms do. This translates into template needing to semantically analyze the candidates it produces (typically between 20 and 100) on the fly, at completion time. The atomic algorithm needs to perform one parsing operation of the user’s input prefix to determine the boundary of the atom it should complete. All other operations are based on metadata that is generated offline and cached for speed. mpc, which is not part of our experiments here, is an order of magnitude faster than the above two algorithms (sub-millisecond) as it fully relies on pre-computed metadata. As demonstrated by the above numbers, all algorithms meet the responsiveness constraints.

6 Related Work

We developed the AC framework described in this work and the corresponding QA framework in the context of the larger problem of improving the usability of information systems. These usability issues have long been recognized [12, 17]. Our focus here is on usable query interfaces. A wide array of solutions have been proposed, from visual query interfaces [6] to textual ones based on keyword queries [1, 13, 30] to interfaces based fully on natural language[19, 20, 30], like the ones we have been developing. Semantic parsing for question answering has a long history [28, 29]. It has seen a revival in recent years [18] brought on in part by business needs whereby non-technical users need access to expressive query interfaces, as well as the proliferation of smart phones and digital personal assistants, where natural language is a convenient mode of interaction.

We tackle the AC problem from a unique angle, heavily informed by the semantics of the underlying query and with the aim of supporting semantic query interfaces. Auto-completion, however, has a long history in the information retrieval community for search interfaces over text documents [7]. Most work in that setting is based on completion from query logs using various flavors of mpc, with the main focus being on appropriate ranking of completions [26]. An important aspect in ranking completions is time-sensitivity [8], reflecting dynamic user needs. This is an aspect that our algorithms have also given attention to in time-sensitive domains like news. Work has also been done on providing completions in the absence of query logs in relatively small-scale search settings like enterprise, intranet, and email search [4]. In this setting, completions are generated either by means of phrase extraction and scoring from the underlying corpora [5, 10]. In Section 4.2 we described how we use a similar approach in the atomic algorithm for the News domain by relying on phrases extracted from news article headlines. More recent work has looked at completing before-unseen prefixes by generating synthetic completions based on -grams extracted from query logs, and relying on neural ranking methods due to their ability to generalize [22]. This is somewhat similar to our atomic algorithm, the main distinction being that instead of arbitrary -grams we rely on semantics to complete to full atoms.

Auto-completion has also been explored for formal query languages like SQL [15] and SPARQL [2]. The goal here is to help users formulate correct formal queries in cases where they might be unfamiliar with language constructs or the vocabulary of the underlying data. Like ours, these systems are guided by semantics and strive to make contextually relevant suggestions. AC for formal query languages can be very helpful for technically proficient users. Our QA and corresponding AC systems target a wider user base, where such proficiency cannot be assumed.

Another line of related work is the verbalization of formal queries [16, 23]. The primary goal here is to allow users to confirm that the formal queries they typed, or those produced by a form-filling interface, capture their information needs. Verbalization systems typically rely on templates, which we also use. These systems take a complete formal query and produce a verbalization. The problem we tackle is a somewhat more challenging one where we’re given only a natural language prefix and we effectively have to predict the semantic intent (the formal query) and then verbalize it in a manner consistent with the input prefix.

References

  • [1] B. Aditya, G. Bhalotia, S. Chakrabarti, A. Hulgeri, C. Nakhe, Parag, and S. Sudarshan. BANKS: Browsing and Keyword Searching in Relational Databases. In VLDB, pages 1083–1086, 2002.
  • [2] H. Bast and B. Buchhold. QLever: A Query Engine for Efficient SPARQL+Text Search. In CIKM, pages 647–656, 2017.
  • [3] H. Bast and E. Haussmann. More Accurate Question Answering on Freebase. In CIKM 2015, pages 299–304, 2015.
  • [4] H. Bast and I. Weber. Type less, find more: fast autocompletion search with a succinct index. In SIGIR, pages 364–371, 2006.
  • [5] S. Bhatia, D. Majumdar, and P. Mitra. Query suggestions in the absence of query logs. In SIGIR, pages 795–804, 2011.
  • [6] S. S. Bhowmick, B. Choi, and C. E. Dyreson. Data-driven Visual Graph Query Interface Construction and Maintenance: Challenges and Opportunities. PVLDB, 9(12):984–992, 2016.
  • [7] F. Cai and M. de Rijke. A Survey of Query Auto Completion in Information Retrieval. Foundations and Trends in Information Retrieval, 10(4):273–363, 2016.
  • [8] F. Cai, S. Liang, and M. de Rijke. Time-sensitive Personalized Query Auto-Completion. In CIKM, pages 1599–1608, 2014.
  • [9] M. G. Helander, T. K. Landauer, and P. V. Prabhu, editors. Handbook of Human-Computer Interaction. Elsevier Science Inc., New York, NY, USA, 2nd edition, 1997.
  • [10] M. Horovitz, L. Lewin-Eytan, A. Libov, Y. Maarek, and A. Raviv. Mailbox-based vs. log-based query completion for mail search. In SIGIR, 2017.
  • [11] G. Hutton. Higher-Order Functions for Parsing. Journal of Functional Programming, 2(3):323–343, 1992.
  • [12] H. V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. Making Database Systems Usable. In SIGMOD, pages 13–24, 2007.
  • [13] M. Joshi, U. Sawant, and S. Chakrabarti. Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries. In EMNLP, pages 1104–1114, 2014.
  • [14] A. Kamath and R. Das. A survey on semantic parsing. CoRR, abs/1812.00978, 2018.
  • [15] N. Khoussainova, Y. Kwon, M. Balazinska, and D. Suciu. SnipSuggest: Context-Aware Autocompletion for SQL. PVLDB, 4(1):22–33, 2010.
  • [16] G. Koutrika, A. Simitsis, and Y. E. Ioannidis. Explaining Structured Queries in Natural Language. In ICDE, pages 333–344, 2010.
  • [17] F. Li and H. V. Jagadish. Usability, Databases, and HCI. IEEE Data Engineering Bulletin, 35(3):37–45, 2012.
  • [18] F. Li and H. V. Jagadish. Constructing an Interactive Natural Language Interface for Relational Databases. PVLDB, 8(1):73–84, 2014.
  • [19] F. Li and H. V. Jagadish. NaLIR: an Interactive Natural Language Interface for Querying Relational Databases. In SIGMOD, pages 709–712, 2014.
  • [20] F. Li and H. V. Jagadish. Understanding Natural Language Queries over Relational Databases. SIGMOD Record, 45(1):6–13, 2016.
  • [21] R. B. Miller. Response Time in Man-Computer Conversational Transactions. In AFIPS, pages 267–277, 1968.
  • [22] B. Mitra and N. Craswell. Query Auto-Completion for Rare Prefixes. In CIKM, pages 1755–1758, 2015.
  • [23] A. N. Ngomo, L. Bühmann, C. Unger, J. Lehmann, and D. Gerber. Sorry, I Don’t Speak SPARQL: Translating SPARQL Queries into Natural Language. In WWW, pages 977–988, 2013.
  • [24] D. H. Park and R. Chiba. A Neural Language Model for Query Auto-Completion. In SIGIR, pages 1189–1192, 2017.
  • [25] D. Savenkov and E. Agichtein.

    EviNets: Neural Networks for Combining Evidence Signals for Factoid Question Answering.

    In ACL, pages 299–304, 2017.
  • [26] M. Shokouhi. Learning to Personalize Query Auto-Completion. In SIGIR 2013, pages 103–112, 2013.
  • [27] S. Wiseman, S. M. Shieber, and A. M. Rush. Learning Neural Templates for Text Generation. In EMNLP, pages 3174–3187, 2018.
  • [28] W. Woods, R. Kaplan, and B. Nash-Webber. The Lunar Sciences Natural Language Information System. Technical report, BBN Inc., 1974.
  • [29] W. A. Woods. Transition Network Grammars for Natural Language Analysis. Communications of the ACM, 13(10):591–606, 1970.
  • [30] C. Yu and H. V. Jagadish. Querying Complex Structured Databases. In VLDB, pages 1010–1021, 2007.