Learning Lexical Entries for Robotic Commands using Crowdsourcing

09/08/2016 ∙ by Junjie Hu, et al. ∙ Carnegie Mellon University 0

Robotic commands in natural language usually contain various spatial descriptions that are semantically similar but syntactically different. Mapping such syntactic variants into semantic concepts that can be understood by robots is challenging due to the high flexibility of natural language expressions. To tackle this problem, we collect robotic commands for navigation and manipulation tasks using crowdsourcing. We further define a robot language and use a generative machine translation model to translate robotic commands from natural language to robot language. The main purpose of this paper is to simulate the interaction process between human and robots using crowdsourcing platforms, and investigate the possibility of translating natural language to robot language with paraphrases.



There are no comments yet.


page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Natural language provides an efficient way for untrained human to instruct a robot to perform collaborative tasks, e.g., navigation and manipulation. However, learning to interpret the meaning of natural language commands is a challenging task [Dukes2014, Perera and Allen2013, Chen and Mooney2011], especially when the robot has little or no prior knowledge of the phrasal expressions in natural language. Due to high flexibility of natural language, it is non-trivial for a robot to cover all the phrasal expressions in natural language when its interpretation module is initially built.

Popular crowdsourcing platforms such as Amazon Mechanical Turk, provide a fast and cheap way to collect interactive data from participants in a wide range of different communities. Hence, simulating the human machine interaction process for information extraction on crowdsourcing platforms has attracted lots of research interests [Nguyen, Wallace, and Lease2015, Hladká, Hana, and Luksová2014, Goldberg, Wang, and Kraska2013]. To encourage the diversity of robotic commands, we simulate the interactive process between a robot and various untrained users on Amazon Mechanical Turk, and collect robotic commands during the process. We further apply a phrase-based machine translation model to mapping natural language command to a robotic language that can be understood by a robot.

Phrase-based Machine Translation Model

To tackle the problem of translating natural language commands to language that can be understood by robots, we first define a robot language that consists of predefined key concepts in the robotic task domains. For example, in the navigation task domain, we define the following key concepts.

  • [topsep=0ex,itemsep=-1ex,partopsep=0ex,parsep=1ex]

  • Action:= navigate

  • Object:= traffic barrel building car

  • Relation:= left right front back

Each robot language command can be deterministically constructed by a combination of key concepts in the task domains. See Figure Phrasal Lexicon Extraction and Translation for an illustration. We then adapt a phrase-based machine translation model to translate robotic commands from natural language to the robot language. For the phrase-based machine translation model, the key component is the extracted phrase table that stores several lexical entries. For a particular input (source-language) sentence , each lexical entry is defined as a tuple , specifying that the span in the source-language sentence can be translated as the target-language string . For each lexical entry

, we estimate a score

that measures the likelihood of translating the span to the target language string by relative frequency under the translation model. For a given lexicon entry

, denote its three components respectively. A derivation of a source-language sentence is defined as a finite sequence of phrases, . For any derivation , refers to the translation sentence constructed by concatenating the strings . For a source-language sentence , we denote as a set of possible derivations of .

Based on the above notations, we aim to extract lexical entries from parallel textual corpus collected on crowdsourcing platforms, and seek the optimal derivation using beam search for the maximum derivation score among all possible derivations under a phrase-based translation model.

In Equation 1, the score of a derivation consists of three parts: (1)

is the log-probability of the target string

under a smoothed trigram language model; (2) is the score of under a translation model; (3) is the distortion penalty for reordering word alignments between source and target languages.


where , and are the weights of the scores given by the language model, the translation model and the distortion penalty respectively. Hence the optimal derivation of a source-language sentence can be obtained by .


We present the process of collecting experimental data on Amazon Mechanical Turk, a popular crowdsourcing platform, and extract parallel lexical entries using Moses [Koehn et al.2007], a machine translation tool.

Stimulation and Data Collection

By showing an image that depicts the behaviour of a robot, a turker is first asked to give a command in English (denoted as ) that clearly indicates the spatial information between objects in the environment for a robot. Next, the turker is shown some robotic concepts in several drop-down lists, and asked to select the correct robotic concepts that can be used to construct a robotic command (denoted as ) for the same image. Finally we simulate the scenario where the robot can actively ask for a paraphrase sentence (denoted as ) of the robotic command in order to help it understand . Totally we collect 88 tuples of for navigation task and 120 tuples of tuples for manipulation task.

Phrasal Lexicon Extraction and Translation

To investigate the possibility of using paraphrase sentences to enhance the phrase-based machine translation, we first use Moses to extract parallel phrases between and . Then we use Moses to extract parallel phrases between and . Table 1 shows the total number of extracted lexical entries when we translate from to and from to . Comparing the second column with the third one in Table 1, we observe that more lexical entries are extracted from parallel sentences between and than those between and . This convinces our idea that turkers usually paraphrase natural language commands that are more semantically closed to the robot language commands after the robotic concepts are shown to them. Table 2 shows some lexical entries extracted from natural language commands paired with robot language commands . We observe that the extracted lexical entries capture the similarity between source-language phrases and target-language phrases, thus enabling many-to-one mapping from syntactic variants in natural language to unique robotic concepts.

#phrase from () #phrase from ()
Navigation 160 748
Manipulation 128 298
Table 1: Number of extracted lexical entries
Navigation Task
Natural Language Robot Language
go straight until you reach a car navigate to the car
backyard of the building behind the building
find the car to the car
which stands before that is in front
move forward to navigate to
located at the right hand side of is on the right of
Table 2: Examples of extracted lexical entries
Navigation Task
Natural Language Translated Robot Language
go to the traffic barrel that
is located on the right hand
side of the building
navigate (Action) to the
traffic barrel (Object) that
is on the right (Relation)
of the building (Object)
go straight forward until you
reach the building. go to the
car behind the building.
navigate (Action) to the
building (Object) that is
navigate (Action) to the
car (Object) that is behind
(Relation) the building
Table 3: Examples of phrase-based translation
(c) Navigation examples: (a) navigate (Action) to the traffic barrel (Object) that is on the right (Relation) of the building (Object); (b) navigate (Action) to the car (Object) that is on the back (Relation) of the building (Object)
(c) Navigation examples: (a) navigate (Action) to the traffic barrel (Object) that is on the right (Relation) of the building (Object); (b) navigate (Action) to the car (Object) that is on the back (Relation) of the building (Object)