This paper presents a Generative RegIon-to-Text transformer, GRiT, for o...
Video instance segmentation (VIS) task requires classifying, segmenting,...
Video Instance Segmentation (VIS) aims to simultaneously classify, segme...
Most online multi-object trackers perform object detection stand-alone i...
Despite the previous success of object analysis, detecting and segmentin...