This is a placeholder of the official implementation of TransCenter. The code will be made publicly available soon.
Transformer networks have proven extremely powerful for a wide variety of tasks since they were introduced. Computer vision is not an exception, as the use of transformers has become very popular in the vision community in recent years. Despite this wave, multiple-object tracking (MOT) exhibits for now some sort of incompatibility with transformers. We argue that the standard representation – bounding boxes – is not adapted to learning transformers for MOT. Inspired by recent research, we propose TransCenter, the first transformer-based architecture for tracking the centers of multiple targets. Methodologically, we propose the use of dense queries in a double-decoder network, to be able to robustly infer the heatmap of targets' centers and associate them through time. TransCenter outperforms the current state-of-the-art in multiple-object tracking, both in MOT17 and MOT20. Our ablation study demonstrates the advantage in the proposed architecture compared to more naive alternatives. The code will be made publicly available.READ FULL TEXT VIEW PDF
We present TrackFormer, an end-to-end multi-object tracking and segmenta...
Standardized benchmarks are crucial for the majority of computer vision
The task object tracking is vital in numerous applications such as auton...
Standardized benchmarks are crucial for the majority of computer vision
We introduce dense vision transformers, an architecture that leverages v...
Computer vision applications such as visual relationship detection and
Existing online multiple object tracking (MOT) algorithms often consist ...
This is a placeholder of the official implementation of TransCenter. The code will be made publicly available soon.
suffer from the same problems because the centers are estimated independently of each other. TransCenter is designed to mitigate these two adverse effects by using dense (pixel-level) multi-scale queries to enable heatmap-based inference and exploiting the attention mechanisms to introduce co-dependency between center predictions.
The task of tracking multiple objects, usually understood as the simultaneous inference of the position and identity of various persons in a visual scene recorded by one or more cameras, became a core problem in computer vision in the past years. Undoubtedly, the various multiple-object tracking (MOT) challenges and associated datasets [40, 10], helped foster research on this topic and provided a standard way to evaluate and monitor the performance of the methods proposed by many research teams worldwide.
or image super resolution, we are interested in investigating the use of transformer-based architectures for multiple-object tracking, as recent evidence [50, 39] demonstrated the interest of exploring the use of such architectures for this task. However, we argue that the pedestrian representation used so far is not appropriate for learning transformer-based architectures for MOT. Indeed, TransTrack  and TrackFormer  use bounding boxes to represent pedestrians, which is very intuitive since bounding-box is a wide-spread representation for MOT for instance in combination with probabilistic methods [45, 2] or deep convolutional architectures [3, 59, 44, 42, 55, 18, 62, 48]. One of the prominent drawbacks of using bounding boxes for tracking multiple objects manifests when dealing with very crowded scenes , where occlusions are very difficult to handle since ground-truth bounding boxes often overlap each other. This is problematic because these bounding boxes are used during training, not only to regress the position, width, and height of each person but also to discriminate the visual appearance associated to each track. In this context, overlapping bounding boxes mean training a visual appearance representation that combines the visual content of two or even more people [23, 22]. Certainly, jointly addressing the person tracking and segmentation tasks  can partially solve the occlusion problem. However, this requires to have extra annotations – segmentation masks – which are very tedious ans costly to obtain. In addition, such annotations are not available in standard benchmark datasets [40, 10].
In this paper, we get inspiration from very recent research in MOT [66, 63] and choose to devise a transformer-based architecture that can be trained to track the center of each person, and name it TransCenter. Therefore, the main difference with respect to TransTrack  and TrackFormer , developed directly from object detection transformers  and  respectively, is that TransCenter is conceived to mitigate the occlusion problem inherent to bounding-box tracking without requiring extra ground-truth annotations such as segmentation masks. While this intuition is very straightforward, designing an efficient transformer-based architecture that implements this intuition is far from evident.
Indeed, the first challenge is to be able to infer dense representations (i.e. center heatmaps). To do so, we propose the use of dense (pixel-level) multi-scale queries. In addition to allowing heatmap-based MOT, the use of dense queries overcomes the limitations [7, 67] associated with querying the decoder with a small number of queries. Inspired by , TransCenter has two different decoders: one for person detection and another one for person tracking. Both decoders are given queries that depend on the current image, but they are extracted with different learnable layers. However, while the memory (i.e. the output of the transformer encoder) of the current frame is given to the detection decoder, the memory of the previous frame is given to the tracking decoder.
Overall, this paper has the following contributions:
We propose the use of transformers for multiple-object center tracking and term this architecture TransCenter.
To infer position heatmaps, we propose the use of dense multi-scale queries that are computed from the encoding of the current image using learnable layers.
TransCenter sets a new state-of-the-art baseline among online MOT tracking methods in MOT17  (+10.1% multiple-object tracking accuracy, MOTA) as well as MOT20  (+5% MOTA), leading both MOT competitions. Moreover, to our knowledge, TransCenter sets the first transformer-based state-of-the-art baseline in MOT20111TrackFormer  is tested on MOT20S, which are sequences from MOT17 containing far less crowded scenes than MOT20., thanks to its ability to track in crowded scenes.
first formulates the problem as an end-to-end learning task with recurrent neural networks. Moreover, models the dynamics of objects by a recurrent network and further combines the dynamics with an interaction and an appearance branch. 3] employs object detection methods for MOT by modeling the problem as a regression task. A person re-identification network [53, 3] can be added at the second stage to boost the performance. However, it is still not optimal to treat the person re-identification as a secondary task.  further proposes a framework that treats the person detection and re-identification task equally.
. The performance of those methods is further boosted by the recent rise of Graph Neural Networks (GNNs): hand-designed graphs are replaced by learnable GNNs[56, 57, 55, 43, 6] to model the complex interaction of the objects.
In most of the methods above, bounding boxes are used as object representation for the network. However, it is not a satisfying solution because it creates ambiguity when objects occlude each other, or noisy background information is included. CenterTrack  and FairMOT  represent objects as heatmaps then reasons about all the objects jointly and associate heatmaps across frames.
Transformer is first proposed by 
for machine translation, and has shown its ability to handle long-term complex dependencies between sequences by using multi-head attention mechanism. With its great success in natural language processing, works in computer vision start to investigate transformers for various tasks, such as image recognition, person re-identification , realistic image generation , super resolution  and audio-visual learning [17, 16].
Object detection with Transformer (DETR)  can be seen as an exploration and correlation task. It is an encoder-decoder structure where the encoder extracts the image information and the decoder finds the best correlation between the object query and the encoded image features with an attention module. However, the attention calculation suffers from heavy computational and memory complexities w.r.t the input size: the feature map extracted from a ResNet  backbone is used as the input to alleviate the problem. Deformable DETR  tackles the issue by proposing a deformable attention inspired by , drastically speeding up convergence (10) and reducing the complexity. This allows to capture finer details by using multi-scale features, yielding better detection performance.
Following the success in detection using transformers, two concurrent works directly apply transformers on MOT based on DETR framework. First, Trackformer  builds directly from DETR  and is trained to propagate the queries through time. Second, Transtrack  extends  to MOT by adding a decoder that processes the features at to refine previous detection positions. Importantly, both methods stay in the detection framework and use it for tracking, a strategy that have proven successful in previous works [59, 3]. However, recent literature [66, 63] also suggests that bounding boxes may not be the best representation for MOT, and this paper investigates the use of transformers for center tracking, thus introducing TransCenter.
We are motivated to investigate the use of transformers for multiple-object tracking. As described in the introduction, previous works in this direction attempted to learn to infer bounding boxes. We question this choice, and explore the use of an alternative representation very popular in the recent past: center heatmaps. However, differently from bounding boxes, heatmaps are dense rather than sparse representations. Consequently, while [50, 39] used sparse object queries, we introduce the use of dense multi-scale queries for transformers in computer vision. Indeed, up to our knowledge, we are the first to propose the use of a dense query feature map that scales with the input image size. To give a figure, in our experiments the decoders are queried with roughly k queries. One downside of using dense queries is the associated memory consumption. To mitigate this undesirable effect, we propose to use deformable decoders, inspired by deformable convolutions.
More precisely, we cast the multiple-object tracking problem into two separate subtasks: the detection of objects at time , and the association with objects detected at . Different from previous studies following the same rationale [3, 59], TransCenter addresses these two tasks in parallel, by using a fully deformable dual decoder architecture. The output of the detection decoder is used to estimate the object center and size, while it is combined with the output of the tracking decoder to estimate the displacement of the object w.r.t. the previous image. An important consequence of combining center heatmaps with the use of a dual decoder architecture is that the object association through time depends not only on geometry features (e.g. IOU) but also on the visual features from the decoder.
The overall architecture of TransCenter can be seen in Figure 2. The RGB images at time and are fed to a CNN backbone to produce multi-scale features and capture finer details in the image as done in  and then to a deformable self-attention encoder, thus obtaining multi-scale memory feature maps associated to the two images, and respectively. Then, is given to a query learning network (QLN), which are fully connected layers operating pixel-wise, that outputs a feature map of dense multi-scale detection queries, . These go through another QLN to produce a feature map of dense multi-scale tracking queries, . A fully deformable dual decoder architecture is then used to process them: the deformable detection decoder compares the detection queries to the memory to output multi-scale detection features , and the deformable tracking decoder does the same with the tracking queries and the memory to output multi-scale tracking features . The detection multi-scale features are used to estimate the bounding box size and the center heatmap . Together with the tracking features and the center heatmap, , the detection features are also used to estimate the tracking displacement .
In the following we first explain the design of the dense multi-scale queries, then the architecture of the fully deformable dual decoder, the three main branches – center heatmap, object size, and tracking – and finally the training losses.
Traditional transformer architectures output as many elements as queries fed to the decoder, and more importantly, these outputs correspond to the entities sought (e.g. pedestrian bounding boxes). When inferring center heatmaps, the probability of having a person’s center at a given pixel becomes one of these sought entities, thus requiring the transformer decoder to be fed with dense queries. Such queries are obtained from the multi-scale encoder’s memory, via a first query learning network (QLN), which is a feed-forward network operating pixel-wise, obtaining. We use two different queries for the dual decoder: a second QLN processes to obtain . They will be fed to the fully deformable dual decoder, see Sec. 3.3.
The fact that the dense query feature map resolution is proportional to the resolution of the input image has two prominent advantages. First, the queries can be multi-scale and exploit the multi-resolution structure of the encoder, allowing for very small targets to be captured by those queries. Second, dense queries also make the network more flexible since it is able to adapt to arbitrary image size. More generally, the use of QLN avoids the problem of manually sizing the queries and selecting beforehand the number of maximum detection, as it was done in previous transformer architectures (for computer vision).
To successfully find object trajectories, a MOT method should not only detect the objects but also associate them across frames. To do so, TransCenter proposes to use a fully deformable dual decoder. More precisely, two fully deformable decoders deal in parallel with the two subtasks: detection and tracking. While the detection decoder correlates and with the attention modules to detect objects in the image , the tracking decoder correlates and to associate the detected objects to their position in the previous image . Specifically, the detection decoder searches for objects in multi-scale with the attention correlated to the multi-scale and then outputs the multi-scale detection features , used to find the object centers and box sizes. Differently, the deformable tracking decoder finds the objects in and associates them with the objects at . To do this, the multi-head deformable attention in the tracking decoder performs a temporal cross-correlation between the multi-scale and and outputs the multi-scale tracking features , containing the temporal information that is used in the tracking branch to estimate the displacements from time back to .
Both the detection and tracking decoders input a dense query feature map so as to output dense information as well. However, the use of the multi-head attention modules used in traditional transformers  in TransCenter implies a memory and complexity growth that is quadratic with the input image size . Of course this is undesirable and would limit the scalability and usability of the method, especially when processing multi-scale features. Naturally, we resort to deformable multi-head attention, thus leading to a fully deformable dual decoder architecture.
The output of the two fully deformable decoders are two sets of multi-scale features, referred to as the detection and tracking features . More precisely, these multi-scale features contain four feature maps at different resolutions, namely and
of the input image resolution. For the center heatmap and the object size branches, the feature maps at different resolutions are combined using deformable convolutions and bilinear interpolation, following the architecture shown in Figure3, into a a feature maps of of the input resolution, and finally into and (the two channels of encode the width and the height). Regarding the tracking branch, the two multi-scale features follow the same up-scaling as in the two other branches (but with different parameters), obtaining two feature maps at resolution . These two feature maps are concatenated to the previous center heatmap downscaled to the resolution of the feature maps. As in the other branches, a block of convolutional layers computes the final output, i.e. the displacement of the objects where the two channels encode the horizontal and vertical displacements respectively.
Training TransCenter is achieved by jointly learning a classification task for the object center heatmap and a regression task for the object size and tracking displacements, covering the branches of TransCenter. For the sake of clarity, in this section we will drop the time index . Center Focal Loss In order to train the center branch, we need first to build the ground-truth heatmap response . As done in , we construct by considering the maximum response of a set of Gaussian kernels centered at each of the ground-truth object centers. More formally, for every pixel position the ground-truth heatmap response is computed as:
where is the ground-truth object center, and is the Gaussian kernel with spread . In our case, is proportional to the object’s size, as described in . Given the ground-truth and the inferred center heatmaps, the center focal loss, is formulated as:
where the scaling factors are and , see .
Sparse Regression Loss The values of and are supervised only on the locations where object centers are present, i.e. using a loss:
The formulation of is analogous to but using the tracking output and ground-truth, instead of the object size. To complete the sparsity of , , we add an extra regression loss, denoted as with the bounding boxes computed from and ground-truth centers. The impact of this additional loss is marginal as shown in Section 4.4.
In summary, the overall loss is formulated as the weighted sum of all the losses, the weights are chosen according to the numeric scale of each loss:
|Public Detections||Private Detections|
|TransCenter (Ours)||green!12no||green!1268.8||green!1279.9||green!1261.4||green!1236.8||green!1223.9||green!1222,860||green!12149,188||green!124,102||green!12no||green!1270.0||green!1279.6||green!1262.1||green!1238.9||green!1220.4||green!1228,119||green!121 36,722||green!124,647|
Inference with TransCenter Once the method is trained, we detect objects by filtering the output center heatmap . Since the datasets are annotated with bounding boxes, we need to convert our estimates into this representation. In detail, we apply a threshold to the heatmap, thus producing a list of center positions . We extract the object size associated to each position in . The set of detections produced by TransCenter is directly . Once the detection step is performed, we can estimate the position of the object in the previous image extracting the estimated displacement from the tracking branch output and the center position . Indeed, we can construct a set of detections tracked back to the previous image . Finally we use the Hungarian algorithm to match the detections at the previous time step with the tracked-back detection to associate the tracks through time. The birth and death processes are naturally integrated in TransCenter: Detections not associated to previous detections give birth to new tracks, while unmatched previous detections are put to sleep for at most frames before being discarded. New tracks are compared to sleeping tracks by means of an external re-identification network from  trained only on MOT17 , whose impact is ablated in the experiments.
Network and Training Parameters The input images are resized to . Both the encoder and the decoder have six layers with hidden dimension
with eight attention heads. The query learning networks consist of two fully connected layers with ReLU activation. Our CNN backbone is ResNet-50. TransCenter is trained with loss weights , and by the AdamW optimizer  with learning rate for the CNN backbone and
for the rest of the network. The training lasts 50 epochs, applying learning rate decay of
at the 40th epoch. The entire network is pre-trained on the pedestrian class of COCO and then fine-tuned on the respective MOT dataset [40, 10]. Overall, with 2 RTX Titan GPUs and batch size 2, it takes around 1h30 and 1h per epoch of MOT20 and MOT17 respectively. We also present the results fine-tuning with extra data, namely the CrowdHuman dataset . See the results and discussion for details.
Datasets and Detections We use the standard split of the MOT17  and MOT20  datasets and the evaluation is obtained by submitting the results to the MOTChallenge website. The MOT17 test set contains 2,355 trajectories distributed in 17,757 frames. MOT20 test set contains 1,501 trajectories within only 4,479 frames, which leads to a much more challenging setting. We evaluate TransCenter both under public and private detections. When using public detections, we limit the maximum number of birth candidates at each frame to be the number of public detections per frame, as in [66, 39]. The selected birth candidates are those closest to the public detections with IOU larger than 0. When using private detections, there are no constraints, and the detections depend only on the network capacity, the use of external detectors, and more importantly, the use of extra training data. For this reason, we regroup the results by the use of extra training datasets as detailed in the following.
|Public Detections||Private Detections|
Extra Training Data To fairly compare with the state-the-art methods, we clearly denote the extra data used
to train each method (including several pre-prints listed in the MOTChallenge leaderboard, which are marked with * in our result tables):222COCO  and ImageNet
and ImageNet are not considered as extra data according to the MOTchallenge [40, 10]. ch for CrowdHuman , pt for PathTrack , re for the combination of Market1501 , CUHK01 and CUHK03  person re-identification datasets, 5d1 for the use 5 extra datasets (CrowdHuman , Caltech Pedestrian [13, 12], CityPersons , CUHK-SYS , and PRW ), 5d2 is the same as 5d1 replacing CroudHuman by ETH , (5d1) uses the tracking/detection results of FairMOT  (trained with in 5d1 setting), and no for using no extra dataset. Metrics Standard MOT metrics such as MOTA (Multiple Object Tracking Accuracy) and MOTP (Multiple Object Tracking Precision)  are used: MOTA is mostly used since it reflects the average tracking performance including the number of FPs (False positives, predicted bounding boxes not enclosing any object), FNs (False negatives, missing ground-truth objects) and IDS  (Identities of predicted trajectories switch through time). MOTP evaluates the quality of bounding boxes from successfully tracked objects. Moreover, we also evaluate on IDF1  (the ratio of correctly identified detections over the average number of ground-truth objects and predicted tracks), MT (the ratio of ground-truth trajectories that are covered by a track hypothesis more than 80% of their life span), and ML (less than 20% of their life span).
MOT17 Table 1 presents the results obtained on the MOT17  dataset. The first global remark is that most state-of-the-art methods do not evaluate under both public and private detections, and under different extra-training data settings, while we do. Secondly, TransCenter systematically outperforms all other methods, in terms of MOTA, under similar training data conditions, both for public and private detections. Indeed, the increase of MOTA w.r.t. the best performing published method is of (10.1% taking unpublished methods into account) and for public detections under extra and no-extra training data, and of and for private detections. If we consider only published methods, the superiority of TransCenter is remarkable in most of the metrics. We can also observe that TransCenter trained with no extra-training data outperforms, not only the methods trained with no extra data but also the methods trained with one extra dataset (in terms of MOTA for both public and private detections). In the same line, TransCenter trained on ch performances better than two of the methods trained with five extra datasets. Overall, these results confirm our hypothesis that heatmaps representation combined with the proposed TransCenter architecture is a better option for MOT using transformers.
MOT20 Table 2 reports the results obtained in MOT20. In public detections, TransCenter leads the competition both in extra ( MOTA) and no-extra ( MOTA) training data. Another remarkable achievement of TransCenter is the significant decrease of FP when compared to the existing methods ( and beyond). Very importantly, to the best of our knowledge, our study is the first to report the results on MOT20 of a transformer-based architecture, demonstrating the tracking capacity of TransCenter even in a densely crowded scenario. For the sake of completeness, we provide the results on MOT20 for private detections and set a new baseline for future research for methods trained under ch and no extra data.
Attention Visualization We show in Fig. 4 the attention from different attention heads of both detection and tracking decoders. We can see that for the detection attention, different heads focus on different areas of : (a) the people; (b), (c) the background; (d) both the background and the people. For the tracking attention, interestingly we observe that the object information at does correlate to the previous image: in (f)-(h), the tracking decoder tries to look for objects at in the surrounding of the positions of the objects at . In addition, it also focuses in the objects in the previous image, as shown within the orange box in (e).
Qualitative Results We report in Fig. 5 qualitative results on the MOT20 test set, to assess the ability of TransCenter to detect and track targets in the context of crowded scenes and highly overlapping bounding boxes. Fig 4(a) and 4(b) are extracted from MOT20-07, Fig 4(c) and 4(d) MOT20-08. We observe that TransCenter manages to keep high recall, even in the context of drastic mutual-occlusions and reliably associate detections across time.
To summarize, TransCenter exhibits outstanding results on both MOT17 and MOT20 datasets for both public and private detections, and for both with or without extra training data, which indicates that multiple-object center tracking using transformers is a promising research direction.
In this section, we experimentally demonstrate different configurations of our TransCenter. For the ablation, we further divide the training sets into train-validation split, we take the first 50% of frames (2,664 and 4,468 frames for MOT17 and MOT20, respectively) as training data and test on the last 25% (1,332 and 2,234 frames for MOT17 and MOT20, respectively). The rest 25% frames in the middle of the sequences are thrown to prevent over-fitting.
Single Decoder Is Not Enough We study the possibility of using one single decoder and one set of dense multi-scale queries to perform tracking. Using a single decoder leads to very poor results, as shown in Figure 6 (Single Decoder). This is because the network switches its attention between image and image during training and eventually fails to track objects correctly at and (low MOTA). More details can be found in supplementary materials. Using a single decoder for sure brings the memory efficiency, which is not so crucial in TransCenter, thanks to the deformable modules . The overall memory consumption is therefore affordable for a normal GPU setting (see details in Sec. 4.1).
Lost Person Re-identification We use an external Re-ID network to recover the identities which are temporally suspended by the tracker. The Re-ID network is the one in , pre-trained on MOT17  training set. Similarly, a light-weight optical flow estimation network LiteFlowNet  pre-trained on Kitti  is used to recover the lost identities. This process helps us to reduce IDS, but the overall tracking performance does not come from these externals networks since FP, FN is not improved by them. see Tab. 3, we even observe a performance drop of FP and FN since the external networks were not finetuned on MOT20.
Beyond Detection We also ablate the D.Detr +IOU matching, which is to use bounding box object detection and handcrafted geometry IOU matching method to perform tracking. From Figure 6, we observe that bounding box object detector can better enclose correctly detected objects (i.e. higher MOTP). However, due to the fact that it lacks the prior information from the past, which leads to a higher IDS and FNs. Without We evaluate the impact of the additional bounding box regression loss that completes the sparse object size loss, as discussed in Section 3.5. We observe a slight performance drop (-0.7% MOTA for MOT17 and -0.3% for MOT20), indicating that the two sparse regression losses and the dense center heatmap focal loss are sufficient to train TransCenter.
In this paper, we introduce TransCenter, a novel transformer-based architectures for multiple-object tracking. TransCenter proposed the use of dense multi-scale queries in combination with a fully deformable dual decoder, able to output dense representations for the objects’ center, size and temporal displacement. The deformable decoder allows processing thousands of queries while keeping the overall memory usage within reasonable boundaries. Under the same training conditions, TransCenter outperforms all its competitors in MOT17 and MOT20, and even exhibits comparable performance to some methods trained with much more data.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
International Conference on Machine Learning (ICML), pages 4364–4375. PMLR, 2020.
Liteflownet: A lightweight convolutional neural network for optical flow estimation.In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 8981–8989, 2018.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, (IJCAI), pages 530–536. International Joint Conferences on Artificial Intelligence Organization, July, 2020. Main track.
|Sparse Queries ||1,086||7526||190||13,989||190,689||2,496|
|Dense Queries (ours)||1,202||6,951||203||12,337||145,546||2,889|
Both models are pre-trained on CrowdHuman  and finetuned on the first half of sequences of MOT17  dataset. From Fig. 7, we see that TransCenter outperforms the method  using sparse queries (+2% MOTA, +0.9% IDF1) on MOT17 . Without fine-tuning on MOT20 , we observe a great discrepancy between the performance of the method using dense and sparse queries (+15.2% MOTA and +6.2% IDF1).
The discrepancy is also reflected in Tab. 4, compared to  in MOT20 , TransCenter, without training on MOT20, can help detect much more objects (-45,143 FNs) while having fewer FPs (-1,652). The rise of IDS is due to the fact that we have more detected objects causing more severe occlusions.
The reason is because of the use of pixel-level queries correlated to the input image. Independent of the number of objects in the image, we do not need to re-parameterize the number of queries according to the number of objects in the image as models using image-independent sparse queries. TransCenter thus generalizes better in more crowded scenes.
|MOT17-all||71.9||81.4||62.3||894 (38.0%)||534 (22.7%)||17,378||137,008||4,046|
|MOT17-all||73.2||81.1||62.2||960 (40.8%)||435 (18.5%)||23,112||123,738||4,614|
|MOT20-all||58.6||79.8||46.7||441 (35.5%)||232 (18.7%)||33,691||175,841||4,850|
|MOT20-all||58.3||79.7||46.8||444 (35.7%)||231 (18.6%)||35,959||174,893||4,947|