Unsupervised Real-to-Virtual Domain Unification for End-to-End Highway Driving
In the spectrum of vision-based autonomous driving, vanilla end-to-end models are not interpretable and suboptimal in performance, while mediated perception models require additional intermediate representations such as segmentation masks or detection bounding boxes, whose annotation can be prohibitively expensive as we move to a larger scale. Raw images and existing intermediate representations are also loaded with nuisance details that are irrelevant to the prediction of vehicle commands, e.g. the style of the car in front or the view beyond the road boundaries. More critically, all prior works fail to deal with the notorious domain shift if we were to merge data collected from different sources, which greatly hinders the model generalization ability. In this work, we address the above limitations by taking advantage of virtual data collected from driving simulators, and present DU-drive, an unsupervised real to virtual domain unification framework for end-to-end driving. It transforms real driving data to its canonical representation in the virtual domain, from which vehicle control commands are predicted. Our framework has several advantages: 1) it maps driving data collected from different source distributions into a unified domain, 2) it takes advantage of annotated virtual data which is free to obtain, 3) it learns an interpretable, canonical representation of driving image that is specialized for vehicle command prediction. Extensive experiments on two public highway driving datasets clearly demonstrate the performance superiority and interpretive capability of DU-drive.
READ FULL TEXT