Summarizing a virtual robot's past actions in natural language
We propose and demonstrate the task of giving natural language summaries of the actions of a robotic agent in a virtual environment. We explain why such a task is important, what makes it difficult, and discuss how it might be addressed. To encourage others to work on this, we show how a popular existing dataset that matches robot actions with natural language descriptions designed for an instruction following task can be repurposed to serve as a training ground for robot action summarization work. We propose and test several methods of learning to generate such summaries, starting from either egocentric video frames of the robot taking actions or intermediate text representations of the actions used by an automatic planner. We provide quantitative and qualitative evaluations of our results, which can serve as a baseline for future work.
READ FULL TEXT