We describe an efficient hierarchical method to compute attention in the...
Learning specific hands-on skills such as cooking, car maintenance, and ...
Pretraining from unlabelled web videos has quickly become the de-facto m...
Instructional videos get high-traffic on video sharing platforms, and pr...
Current image captioning methods are usually trained via (penalized) max...