We follow the idea of formulating vision as inverse graphics and propose a new type of element for this task, a neural-symbolic capsule. It is capable of de-rendering a scene into semantic information feed-forward, as well as rendering it feed-backward. An initial set of capsules for graphical primitives is obtained from a generative grammar and connected into a full capsule network. Lifelong meta-learning continuously improves this network's detection capabilities by adding capsules for new and more complex objects it detects in a scene using few-shot learning. Preliminary results demonstrate the potential of our novel approach.
05/22/2019 ∙ by Michael Kissner, et al. ∙ 0 ∙ share
Many current methods to learn intuitive physics are based on interaction networks and similar approaches. However, they rely on information that has proven difficult to estimate directly from image data in the past. We aim to narrow this gap by inferring all the semantic information needed from raw pixel data in the form of a scene-graph. Our approach is based on neural-symbolic capsules, which identify which objects in the scene are static, dynamic, elastic or rigid, possible joints between them, as well as their collision information. By integrating all this with interaction networks, we demonstrate how our method is able to learn intuitive physics directly from image sequences and apply its knowledge to new scenes and objects, resulting in an inverse-simulation pipeline.
05/23/2019 ∙ by Michael Kissner, et al. ∙ 0 ∙ share
Michael Kissneris this you? claim profile