1 Introduction
Semantic scene parsing is a foundational image understanding problem in the vision community [zheng2021rethinking, zhao2018icnet, li2020improving, yu2018bisenet, yang2018denseaspp, zhang2018exfuse, yuan2020object]. Typically, the goal is to segment objects and “stuff” regions (e.g. road, background) in the scene. Multi-object multi-part parsing is a significantly more challenging variant which requires part-level segmentation of each scene object [bsanet, gmnet, co-rank]. Compared to traditional object-level segmentation, semantic representations infused with fine-grained part-level knowledge can provide richer information for downstream reasoning tasks including visual question answering [hong2021ptr], perceptual concept learning [DBLP:journals/corr/abs-2111-05251], shape modelling [achlioptas2019shapeglot, dubrovina2019composite] and many others [dong2014humanparsing, chen2014detect, 10.1007/978-3-642-33718-5_60, DBLP:journals/corr/ZhangDGD14, sun2013learning, krause2015fine].
For part-based object segmentation, some existing approaches tackle the simpler problem of single-object part parsing [gong2018instance, fang2018weakly, wang2015joint, wang2015semantic, haggag2016semantic]. Although a few recent approaches have addressed multi-object multi-part parsing [bsanet, gmnet, co-rank], they consider part labels to be independent and do not take advantage of intra/inter ontological relationships among objects and parts at label level. They also tend to perform poorly on smaller and infrequent parts/categories. To address these shortcomings, we propose FLOAT, a novel factorized label space framework for scalable multi-object multi-part parsing. Our approach is motivated by the following observations:
Observation #1: Object part names in datasets typically consist of a root component and side component(s). Many object categories contain parts with the same root component. For example, the root component of ‘ left front leg’ found in horse, cow etc. and ‘ right leg’ found in person, is leg. Therefore, parts can be grouped based on their root component.
The example also suggests that object categories whose instances contain shared category-level attributes (e.g. “living things that move”) are likely to contain same root components (such as leg). Using this criterion, some object categories (e.g. cow, person, bird) can be grouped as ‘animate’. Similarly, some categories (e.g. “rigid bodied”) can be grouped as ‘inanimate’. As with the ‘animate’ group, ‘inanimate’ group categories also share many root part components (e.g. ‘wheel’ in aeroplane, bicycle, car).
Observation #2: Similar to Observation #1, parts can also be grouped by side component – e.g. ‘ front’ is a side component of ‘ front wheel’ found in bike and ‘ left front leg’ in person.
Factoring the object/part label space in terms of these groups (‘animate’, ‘inanimate’, ‘side’) greatly reduces the effective number of output labels. In turn, this increases scalability in terms of object categories and part cardinality. The design choice (‘factoring’) also enables efficient data sharing when learning semantic representations for grouped parts and improves performance for infrequent classes (see Fig. 1).
A second key feature of our framework is IZR, an inference-time segmentation refinement technique. IZR transforms ‘zoomed in’ versions of preliminary per-object label maps into refined counterparts which are finally composited back onto the segmentation canvas. Apart from the advantage of not requiring additional training, IZR is empirically superior to alternate inference-time schemes and significantly improves segmentation quality, especially for smaller objects/parts.
In existing works, results are reported on simplified, label-merged versions of the original dataset (Pascal-Part [chen2014detect]). In our work, we incorporate previously excluded part attributes and other minor parts to create Pascal-Part-201, the most comprehensive and challenging version of Pascal-Part [chen2014detect]. Along with the standard mean IOU (mIOU) and mAvg scores, we report sqIOU [kirillov2019panoptic] and sqAvg – normalized segmentation quality measures which are less affected by spatial scale of objects and parts.
In summary, our contributions are the following:
2 Related Work
Semantic segmentation is a broad area with intensive research. We do not attempt to summarize all approaches to enable focus on more directly relevant works. A common design pattern for semantic segmentation is the encoder-decoder setup [7803544, zhao2017pyramid, chen2017deeplab, article_123]. In particular, the baselines, existing approaches and our proposed approach all adopt the popular DeepLab architecture [chen2017deeplab] for various components of the segmentation task pipeline.
Single-Object Multi-Part Parsing has been extensively explored. Existing approaches typically consider object category subsets such as persons [fang2018weakly, liang2018look, liang2016semantic, nie2018mutual, xia2016zoom, xia2017joint, xia2015pose, zhao2017self, gong2018instance, liang2015human, luo2018macro, liu2020hybrid], animals [haggag2016semantic, wang2015semantic, wang2015joint] and vehicles [liang2016semantic, nie2018mutual, song2017embedding, liu2021cgpart]. However, in this setting, most works assume a single object of interest per image.

Multi-object multi-part parsing is a relatively new and under studied problem [bsanet, gmnet, co-rank]. The approaches of Zhao et al. [bsanet] and Michieli et al. [gmnet] tackle multi-object multi-part parsing by providing object-level feature guidance to the part segmentation network during optimization. Zhao et al. [bsanet] additionally provides boundary-level awareness to features. Tan et al. [co-rank] create a semantic co-ranking loss modelling intra and inter part relationships. Xiao et al. [xiao2018unified] introduce a composite dataset and an approach for predicting perceptual visual concepts in scenes. However, in contrast to our framework, these approaches report results on simplified (label-merged) versions of standard datasets and empirically exhibit inferior performance for smaller parts.
Factorization: In machine vision applications, early works such as Zheng et al. [DenseObjAtt_CVPR2014] used factorial Conditional Random Field models to separately predict object category, coarse object labels and object attributes such as shape, material and surface type. Other works involve jointly learning object and attribute-related information as a separable latent representation [nagarajan2018attributes] or using graph networks [naeem2021learning]. Misra et al. [misra2017red]
propose a factorization over global object attributes and object classifiers to enable compositionality. Other works extend this idea to inter-object relationships, e.g. noun-preposition-noun triplets
[malinowski2014pooling, lan2012image, hong2021ptr]. In all these works, a simple global property of the object (e.g., material, texture, color, size, shape) is learnt jointly with the object category information. In their work on panoptic part segmentation, Geus et al. [de2021part] conduct experiments involving two categories from Pascal-Part-58 with some parts grouped by semantic similarity. Graphonomy, a framework by Lin et al. [lin2020graphonomy] can span multiple datasets with a flat label structure and requires a manually specified graph per category. Such rigid connectivity relationships are unsuitable for modelling highly articulated objects (e.g. animals) found in our setting. To the best of our knowledge, we are the first to show that object parts can be factorized across diverse object categories at scale, and that such factorization significantly improves segmentation performance, in resonance with theories of visual recognition [biederman1987recognition, HOFFMAN198465].Zooming in on image regions using bounding boxes generated by attention maps [wang2017zoom]
and reinforcement learning policies
[dong2018reinforced, xu2021adazoom]have been found to improve detection and segmentation. Other works use the technique on object instances for video interpolation
[yuan2019zoom-in-to-check] and on part instances for object parsing [xia2016zoom]. Porzi et al. [porzi2021improving] use zoomed in crops based on object classes for improving panoptic segmentation of high resolution images. Similar to the latter set of approaches, FLOAT also employs zooming in on object regions. However, our zoom-based refinement does not require any extra training and can be directly used during inference for improved performance.3 Our framework (FLOAT)
As mentioned earlier, FLOAT’s design leverages the shared-attribute groups that naturally exist within object categories (‘animate’, ‘inanimate’) and part attributes (‘left’, ‘right’, ‘front’, ‘back’) - see Fig. 2. The sections that follow describe how we operationalize the idea. Although our approach is general in nature, we use object categories and part names from the Pascal-Part dataset [chen2014detect] for ease of understanding.

are used to obtain padded bounding boxes for scene objects (
). The corresponding object crops () are processed by the factorized network (, Sec. 3). The resulting label maps () are composited to generate , the final refined part segmentation map (). Notice the improvement in segmentation quality relative to the part label map without IZR (included for comparison).3.1 Relabeling images with factored labels
The original Pascal-Part dataset contains object and part level label maps. We re-label or partition these maps to obtain five new label groups as described below.
object: The label set for this group comprises unique object category labels. For example, in Fig. 2 is a label map from this group containing person and bicycle objects.
animate: For this group, the label set comprises root components of part labels from the object categories bird, cat, cow, cat, dog, horse, person, sheep. The part labels are pooled across all object categories. For example, a single label leg covers all corresponding part instances from all objects in the ‘animate’ group. This can also be seen in in Fig. 2 – the left foot and right foot of person are color-coded the same (‘orange’) and assigned the common label foot.
inanimate: The label set comprises root components of part labels from aeroplane, bicycle, bottle, bus, car, motorbike, pottedplant, train, tv. Note that (i) these categories are disjoint from the ‘animate’ group (see in Fig. 2) (ii) the part label pooling mentioned for ‘animate’ is applicable here as well.
side: In this case, two disjoint label groups exist. One group comprises all part labels which have the words ‘left’ or ‘right’ in their name (e.g. left hand, right wing). Label map regions whose part labels contain ‘left’/‘right’ are considered seed pixels for a flood-fill style procedure which produces corresponding ‘left’/‘right’ label maps (e.g. in Fig. 2). The same procedure is used for the label groups which have the words ‘front’ or ‘back’ in their name (see in Fig. 2). Appendix A.2 contains detailed explanation of the flood-fill algorithm.
Broadly, object parts from living things that move are in the ‘animate’ group while other parts, typically from rigidly shaped non-living things, are in the ‘inanimate’ group. As mentioned before, such grouping enables data-efficient representation learning for common parts (e.g. torso in ‘animate’ group). A similar reasoning holds for ‘side’ directional grouping ({‘left’, ‘right’}, {‘front’,‘back’}).
3.2 Factorized semantic segmentation architecture
We configure the segmentation architecture to output the factorized label maps described in previous section. As Fig. 2 shows, we employ two semantic segmentation networks, one for object-level and other for part-level label maps. The object-level network () outputs the object prediction map (). The part-level network consists of a shared encoder (), and three decoders: the ‘animate’ decoder () which outputs the ‘animate’ label map (), the ‘inanimate’ decoder () which outputs the ‘inanimate’ label map (). The ‘side’ decoder () outputs the ‘left/right’ () and ‘front/back’ () label maps. The outputs from the object-level network () and part-level network () are merged at inference time. We describe this merging process next.
3.3 Top-Down Merge
To combine the factorized label maps output by segmentation architecture (see Fig. 2), we adopt a top-down merging strategy. For each object (e.g. bicycle) in the object prediction map (), we examine the labels of corresponding pixel locations in the part-level label maps. Depending on the type of object (‘animate’ or ‘inanimate’), the corresponding label regions are copied to the scene-level prediction canvas. (e.g. for bicycle, the considered labels in would be wheel, chainwheel, handlebar, headlight, saddle). Similarly, the object-level map’s pixel locations are referenced from ‘side’ label maps ({‘left’,‘right’} - , {‘front’,‘back’} - ). In case of conflicts, the prediction defaults to background. The corresponding label regions are copied to the scene prediction canvas. Detailed explanation of top-down merging can be found in Appendix A.1 .
In the next section, we describe how the resulting prediction map is refined using a per-object ‘zooming’ technique.
3.4 Inference-time Zoom Refinement (IZR)
The Inference-time Zoom Refinement (IZR) technique improves segmentation quality by ‘zooming’ into each scene object. As the first step, the input image is processed by the object-level network to obtain object-level map (see in Fig. 3). The bounding box corresponding to each object component is then padded so that the object is centered and aspect ratio is preserved ( in Fig. 3). Image crops corresponding to the padded bounding box extents are then obtained (). Note that the padding enables scene context to be included for each cropped object and also helps account for inaccuracies in the object map prediction. The cropped object images are then processed by FLOAT’s factorized network to obtain the corresponding part-level label maps (). These label maps are then composited to generate the final refined segmentation map (). In the next two sections, we describe the optimizer formulation for the networks in FLOAT and implementation details.
3.5 Optimization
We train the object model (Sec. 3.2) using the standard per-pixel cross-entropy loss. For training the part-level model, we use a combination of cross-entropy loss () and graph matching loss () [gmnet]. The cross-entropy loss is applied to each of the 4 output part-level maps i.e. (see Fig. 2).
The graph matching loss [gmnet] captures proximity relationships between part pairs within the map and scores the matching of these pairs between the ground truth and the predicted map. The degree of proximity between a part pair is represented by the number of pixels in one part situated pixels or less from the other part, where is an empirically set threshold. For efficiency, the pairwise proximity map is approximated by dilating each part mask by and computing the intersecting region. The ground truth proximity map (and similarly predicted map ) is formally defined as: where is the proximity between the th and th parts, are the respective part mask, is a generic pixel, is morphological 2D dilation operator and is the cardinality of the given set. A row-wise normalization is applied to the proximity matrix: . The graph matching loss is computed as the Frobenius norm between the two adjacency matrices: .
Additionally, for the ‘animate’ and ‘inanimate’ branches, a composite foreground-background binary cross-entropy loss serves as extra guidance. The loss for the part level network is a weighted combination of the losses for all part branches: , where .
3.6 Implementation and Training Details
For fair comparison with previous works [bsanet, gmnet, co-rank], we employ the DeepLab-v3 [chen2017deeplab]
architecture with a ImageNet pre-trained ResNet-101
[he2016deep] as the encoder (backbone) and follow the same training scheme and augmentations. During training, images are randomly left-right flipped and scaled to times the original resolution with bilinear interpolation. The results at testing stage are reported at the original image resolution. The threshold employed for proximity matrix (Sec. 3.5) is empirically set to . The model is trained for 40K steps with the base learning rate set to which is decreased with a polynomial decay rule with power . We employ weight decay regularization of . We use a batch size of images and use for weighting graph matching loss relative to the cross-entropy loss. We use 2 NVIDIA A100 GPUs each with 40GB GPU memory to train our models, and for experiments. Full computational and memory requirement can be found in Appendix C.4 Datasets and Evaluation Metrics

Pascal-Part: For experiments, we use the Pascal-Part [chen2014detect] which is currently the largest multi-object multi-part parsing dataset. It contains variable-sized images with pixel-level part annotations on the Pascal VOC2010 [everingham2010pascal] semantic object classes (plus the background class). We use the original split from Pascal-Part with images for training and images in the publicly provided validation set for testing.
Pascal-Part-58/108: For comparison with previous work, we use the datasets Pascal-Part-58 [bsanet] and Pascal-Part-108 [gmnet] which contain and part classes respectively. Both the Pascal-Part variants simplify the original semantic classes by grouping some parts together, and contain and part classes respectively. Pascal-Part-58 mostly contains large parts of objects such as head, torso, leg etc. for animals and body, wheel etc. for non-living objects. Pascal-Part-108 is more challenging and additionally contains relatively smaller parts (e.g. eye, neck, foot etc. for animals and roof, door etc. for non-living objects).
Pascal-Part-201: We incorporate part attributes (‘left’, ‘right’, ‘front’, ‘back’, ‘upper’, ‘lower’) and other minor parts (e.g. eyebrow) excluded in both the mentioned variants (58/108), to create the most comprehensive and challenging version of the dataset containing parts which we dub Pascal-Part-201. We observed that the original part labelling scheme in Pascal-Part leaves out large chunks of an object’s pixels unlabelled for the bike, motorbike and tv categories which lead to disconnected objects. To address this, we add a body part annotation for bike, motorbike, and a frame part for tv. An example illustrating the differences in part labelling and granularity of the Pascal-Part variants can be seen in Fig. 4.
Model |
bgr |
aero |
bike |
bird |
boat |
bottle |
bus |
car |
cat |
chair |
cow |
table |
dog |
horse |
mbike |
person |
plant |
sheep |
sofa |
train |
TV |
mIOU | mAvg |
Baseline | 91.0 | 31.6 | 47.7 | 24.3 | 56.7 | 46.4 | 31.0 | 36.7 | 24.2 | 35.6 | 17.5 | 38.6 | 27.3 | 20.7 | 38.0 | 26.9 | 50.8 | 13.3 | 42.1 | 14.7 | 57.6 | 26.3 | 36.8 |
GMNet[gmnet] | 90.8 | 26.6 | 33.1 | 21.2 | 55.0 | 43.5 | 24.6 | 27.5 | 21.7 | 35.5 | 15.1 | 40.3 | 25.0 | 17.5 | 31.9 | 21.9 | 44.2 | 11.9 | 43.3 | 14.0 | 53.2 | 22.5 | 33.2 |
BSANet[bsanet] | 91.2 | 34.6 | 41.7 | 27.9 | 61.2 | 51.7 | 34.1 | 38.1 | 26.1 | 35.4 | 24.0 | 43.6 | 28.4 | 23.0 | 37.4 | 27.7 | 54.7 | 14.3 | 40.4 | 17.8 | 59.4 | 28.5 | 38.7 |
FLOAT | 92.5 | 36.7 | 49.7 | 34.4 | 75.3 | 51.4 | 35.8 | 42.0 | 37.8 | 59.6 | 35.5 | 58.2 | 41.0 | 34.0 | 40.2 | 40.8 | 52.2 | 28.5 | 69.0 | 15.1 | 56.1 | 37.1 | 46.9 |
bgr |
aero |
bike |
bird |
boat |
bottle |
bus |
car |
cat |
chair |
cow |
table |
dog |
horse |
mbike |
person |
plant |
sheep |
sofa |
train |
TV |
sqIOU | sqAvg | |
Baseline | 89.6 | 28.9 | 39.3 | 17.1 | 57.4 | 32.3 | 27.1 | 26.0 | 20.5 | 39.8 | 14.8 | 34.7 | 22.7 | 17.2 | 31.5 | 19.2 | 34.9 | 10.8 | 52.6 | 14.4 | 53.8 | 21.5 | 32.6 |
GMNet[gmnet] | 89.4 | 20.7 | 23.5 | 12.6 | 53.1 | 25.8 | 19.3 | 17.2 | 18.1 | 38.2 | 11.2 | 35.2 | 15.9 | 14.2 | 25.4 | 13.8 | 26.9 | 8.5 | 52.0 | 13.8 | 46.9 | 16.9 | 27.7 |
BSANet[bsanet] | 89.9 | 30.7 | 33.5 | 18.6 | 60.2 | 31.2 | 29.2 | 26.4 | 21.2 | 37.8 | 17.5 | 38.0 | 22.3 | 17.8 | 31.2 | 18.2 | 33.6 | 10.8 | 47.2 | 17.5 | 55.4 | 22.1 | 32.8 |
FLOAT | 90.8 | 32.5 | 41.8 | 24.5 | 63.9 | 36.1 | 30.4 | 29.9 | 33.0 | 50.8 | 28.1 | 47.6 | 35.6 | 26.1 | 33.6 | 29.9 | 34.5 | 20.6 | 69.0 | 13.6 | 56.8 | 29.6 | 39.5 |
4.1 Evaluation Metrics
For performance evaluation, we use two versions of Intersection over Union (IOU) metric. We first describe mIOU and mAvg, the standard segmentation quality metrics reported for the problem setting. We then describe balanced variants of these metrics – sqIOU and sqAvg.
mIOU: Let and be the prediction and ground truth respectively for the th part in the th image . Suppose the dataset contains images. The mIOU for the part () is calculated as:
(1) |
where is the indicator function (i.e. summation is performed only for images where part is present). The mIOU for the dataset is then calculated as: , where is the number of part categories (classes) in the dataset (58/108/201).
mAvg: The mIOU score for an object category is the average of its per-part scores, i.e. where is the number of unique part labels in object category . Finally, mAvg is calculated as , where is the number of object categories ( for Pascal-Part datasets).
sqIOU: This is a modified version of Segmentation Quality (SQ) metric [kirillov2019panoptic] tailored for semantic segmentation. The sqIOU for the part is calculated as:
(2) |

The calculation for sqIOU and sqAvg is similar to that of mIOU. Due to their formulation, mIOU and mAvg [gmnet, bsanet] tend to be dominated by contributions from bigger111Informally, an instance is deemed “big” if it is among the largest instances for an object part category by area. instances. In contrast, sqIOU and sqAvg weight parts of all sizes equally – compare Eqn. 1 and 2 and also see the toy example in Fig. 5. Therefore, sqIOU and sqAvg can be considered a more ‘fair’ measure for segmentation quality.
5 Experimental Results
For evaluation, we compare the performance of FLOAT with BSANet [bsanet], GMNet [gmnet] and CO-Rank [co-rank]. As a baseline, we train a DeepLab-v3 [chen2017deeplab] model with independently paired object category and associated part names (e.g. cow left eye, cow right ear) as labels. BSANet and CO-Rank report results on Pascal-Part-58 while GMNet additionally reports results on Pascal-Part-108. We report results on all variants of the Pascal-Part dataset, including our newly introduced Pascal-Part-201. To enable comparison, we train GMNet and BSANet on our dataset, Pascal-Part-201. For evaluation, we employ the mIOU, mAvg and sqIOU, sqAvg metrics described previously (Sec. 4.1). In addition, we analyze the relative contribution of various components in FLOAT via ablation studies. Full results table can be found in Appendix F.
5.1 Pascal-Part-201
Table 1 shows the category-wise and overall performance on Pascal-Part-201. Overall, we see that FLOAT outperforms baselines and existing approaches by a significantly large margin. We obtain large gains of 10.8% on mIOU and 8.1% on sqIOU relative to the baseline. We outperform the next best method BSANet [bsanet] by large margins of 8.6% on mIOU and 7.5% on sqIOU as well.
Empirically, we obtain significant sqIOU gains of 10%-30% on small parts – for e.g. left/right eye, left/right ear, left/right horn etc. of ‘animate’ categories such as bird, cat, cow. For ‘inanimate’ categories (e.g. bus, car, aeroplane), we obtain sqIOU improvements in the range of 5%-11% on small parts such as front/back plate, left/right wing. The performance improvement is also similarly substantial for most parts containing side components (‘left/right’ or ‘front/back’).
5.2 Pascal-Part-58 and Pascal-Part-108
Method | Dataset | mIOU | mAvg | sqIOU | sqAvg |
---|---|---|---|---|---|
Baseline | 58 | 54.3 | 55.4 | 46.0 | 48.4 |
BSANet[bsanet] | 58.2 | 58.9 | 49.3 | 51.5 | |
GMNet[gmnet] | 59.0 | 61.8 | 49.4 | 54.3 | |
CO-Rank[co-rank] | 60.7 | 60.6 | - | - | |
FLOAT | 61.0 | 64.2 | 54.2 | 57.1 | |
Baseline | 108 | 41.3 | 43.6 | 32.2 | 36.1 |
BSANet[bsanet] | 45.9 | 48.4 | 36.6 | 41.0 | |
GMNet[gmnet] | 45.8 | 50.5 | 35.8 | 41.9 | |
FLOAT | 48.0 | 53.0 | 40.5 | 45.6 |

Method |
Dataset |
Output Heads |
No Factorization |
Object |
Part |
Anim/Inanim |
Side |
Inference Augmentation |
mIOU | sqIOU |
Baseline | 58 | 58 | ✓ | - | 54.3 | 46.0 | ||||
45 | ✓ | ✓ | - | 60.7 | 51.5 | |||||
45 | ✓ | ✓ | - | 60.9 | 51.7 | |||||
FLOAT | 45 | ✓ | ✓ | - | IZR | 61.0 | 54.2 | |||
Baseline | 108 | 108 | ✓ | - | 41.3 | 32.2 | ||||
68 | ✓ | ✓ | - | 46.1 | 36.7 | |||||
68 | ✓ | ✓ | - | 47.8 | 38.4 | |||||
FLOAT | 68 | ✓ | ✓ | - | IZR | 48.0 | 40.5 | |||
Baseline | 201 | 201 | ✓ | 26.3 | 21.5 | |||||
119 | ✓ | ✓ | 29.1 | 22.8 | ||||||
119 | ✓ | ✓ | 31.3 | 24.1 | ||||||
80 | ✓ | ✓ | ✓ | 36.9 | 27.8 | |||||
* | 80 | ✓ | ✓ | ✓* | 36.9 | 27.6 | ||||
+ RCZ | 80 | ✓ | ✓ | ✓ | RCZ | 36.6 | 28.0 | |||
FLOAT | 80 | ✓ | ✓ | ✓ | IZR | 37.1 | 29.6 |
We also show results on previously proposed datasets Pascal-Part-58 [bsanet] and Pascal-Part-108 [gmnet]. As shown in Table 2, FLOAT framework achieves the best performance on both these datasets. In terms of mIOU, we outperform CO-Rank [co-rank] by 0.3% on Pascal-Part-58 and GMNet [gmnet] by 2.0%. In terms of sqIOU, we outperform other methods by large margins as well – 4.8% over GMNet and 4.9% over BSANet. A similar trend is seen for Pascal-Part-108 with large improvements of 2.1% on mIOU and 3.9% on sqIOU over the next best method BSANet [bsanet].
Overall, the results across existing and challenging new variants of Pascal-Part dataset demonstrate the strengths of our factorized label space setup. In particular, the increasing gains with increasing dataset complexity demonstrates the superior scaling capacity of the FLOAT framework.
5.3 Ablation Studies
We perform multiple experiments with ablative variant models of FLOAT to verify the effectiveness of our design choices. From the results in Table 3, we see that starting from baseline (first row in each dataset variant), systematically adding components of FLOAT pipeline noticeably improves segmentation quality. The gains are most apparent for Pascal-Part-201 dataset, particularly when factorized components are included. From the last two rows, we also see that IZR is a superior choice compared to Random Crop Zoom (RCZ) - a variant which uses random crops whose cardinality matches the number of objects in the scene. Some part names in the original Pascal-Part dataset [chen2014detect] contain the side component ‘upper/lower’. We attempted to train a FLOAT variant with these components as outputs of decoder. However, the model failed to converge. We hypothesize this is due to the drastically smaller quantum of training data compared to other side attributes, i.e. ‘left/right’ and ‘front/back’.
5.4 Qualitative Analysis
Fig. 6 shows qualitative comparisons of our framework with existing approaches on Pascal-Part-201, reflecting the improvements gains we observe for mIOU and sqIOU metrics (Table 1). FLOAT is visually superior at segmenting smaller object parts – notice the significantly improved segmentation for parts in object categories person ( first row) and cat (second row). From the examples, we see that FLOAT is also better at learning directionality (‘left/right’, ‘front/back’). Similar improvements are evident from the examples provided in Figure 1 (Appendix E contains additional examples). Some limitations of FLOAT include missing predictions for the smallest of parts (e.g. eye in people far from camera) and partial predictions for thin parts leading to disconnections.
6 Conclusion
FLOAT is a simple but effective framework for improving semantic segmentation performance in multi-object multi-part parsing. Our idea of factorized label space is a key contribution which fully takes advantage of label-level intra/inter ontological relationships among objects and parts. The factorization not only enables scalability in terms of both object categories and part labels, but also improves segmentation performance substantially. Another key contribution is our inference-time zoom. By focusing only on object-centric regions of interest, IZR efficiently enhances segmentation quality without requiring explicit object feature guidance or other modifications to the part network setup. Apart from our framework, we introduce a new variant of Pascal-Part called Pascal-Part-201 which constitutes the most challenging benchmark dataset for the problem. Our experimental evaluation, using fairer versions of existing measures, shows that FLOAT clearly outperforms existing state-of-the-art approaches for existing and newly introduced Pascal-Part variants. The gains from our framework increase with increased part and object dataset complexity, empirically supporting our assertion of FLOAT’s scalability. Although presented in a 2D scene parsing setting, we expect ideas from FLOAT to be useful for the 3D scene parsing counterpart and in general, for scenarios with appropriately factorizable attributes.
References
Appendix A Algorithm details
a.1 Top Down Merge
The flowchart in the following page describes the “Top Down Merge” algorithm per pixel to obtain the final label for that pixel (aggregation across the image gives the final prediction). As described in the paper, each label consists of an object, a root part component and side component(s). For FLOAT, these are determined separately and merged to obtain the final label at each pixel. For each pixel :
-
We obtain the object category predicted. We now have an “object” label.
-
Choose the part from the animate part map or the inanimate part map depending on the object category. We now have an “object part” label.
-
We now add side components :
-
Animate:
-
For animate categories, a part can have both left/right and front/back labels.
-
Depending on what side components the “object part” needs to match the original label space, the same are added from the and side maps.
-
To make sure each pixel has a left/right and a front/back label, while taking the softmax, we ignore the background category prediction.
-
-
Inanimate:
-
For animate categories, a part can have only one of left/right/front/back labels.
Figure 7: Figure 8: -
We compute the combined Left-Right-Front-Back (LRFB) map by combining the Left-Right () and Front-Back () maps using confidence (softmax) values.
-
If the “object part” needs the side component, the same is added from the LRFB map.
-
-
Hence, we get all components required from predicting the final label for each pixel : “Object L/R F/B Part” for animate and “Object L/R/F/B Part” for inanimate objects.
a.2 Flood Fill for Side Component Ground Truths
As an approximation to a breadth-first search style flood fill for generation ground truths, we compute the side component label for each pixel by allocating it the same label as the one closest to it in the map without flood fill.
Let’s assume, for an object, the original left-right map is LR_org and the map we want to compute is LR_fill.
The 0-1 object mask for the under consideration object is obj_mask. The python snippet for computing LR_fill
given LR_org and obj_mask is given in Figure 8 (FB_fill can be computed from FB_org using the same):
a.3 Illustration of Factorization described in Introduction
Objects are split into animate and inanimate groups. The parts in each group share root components which are merged to form the label set for part prediction for each set of objects. See Figure 9 for pictorial illustration.

Appendix B Animate/Inanimate object group split
There are total 7 animate and 10 inanimate object categories with parts. See Figure 10 for group split.

Appendix C Memory and Compute:
Table 4 summarizes the compute requirements for various models and datasets configurations. Despite the somewhat larger number of parameters compared to other models, FLOAT trains faster and provides significant segmentation performance gains.
Method | Dataset | Params(M) | Train Time (mins) | Test Time (secs) |
---|---|---|---|---|
BSANet | 58 | 63.9 | 40.2 | 0.45 |
GMNet | 124.9 | 33.6 | 0.49 | |
FLOAT (45) | 135.4 | 30.1 | 1.02 (0.55) | |
BSANet | 108 | 63.9 | 43.7 | 0.72 |
GMNet | 124.9 | 37.2 | 0.75 | |
FLOAT (68) | 135.4 | 34.8 | 1.38 (0.80) | |
BSANet | 201 | 64.0 | 47.1 | 1.30 |
GMNet | 124.9 | 40.3 | 1.35 | |
FLOAT (80) | 153.6 | 38.6 | 2.14 (1.43) |
Compute comparisons of FLOAT with previous methods. Train time is per epoch. Test time is per instance. (Batch size
5). Total output heads for FLOAT given in brackets under method. Test time in brackets for FLOAT quotes time without IZR.Appendix D Limitations
-
Partial predictions of objects with only a few parts visible in the scene.
-
Bad predictions around complicated boundaries, eg - rider on a bicycle.
-
Missing some very small/obscure objects in an image.
-
Missing some predictions for objects with bad lighting / extremely varying shapes.
Appendix E Results:
e.1 Part-58


e.2 Part-108


e.3 Part-201


Appendix F Pascal-Part Results
f.1 Pascal-Part-58 mIOU comparison
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Background | 90.1 | 91.6 | 92.7 | 92.9 |
Aeroplane Body | 65.3 | 69.5 | 69.6 | 70.2 |
Aeroplane Engine | 24.9 | 28.9 | 25.7 | 29.7 |
Aeroplane Wing | 33.9 | 37.3 | 34.2 | 37.6 |
Aeroplane Stern | 56.3 | 57.2 | 57.2 | 58.3 |
Aeroplane Wheel | 43.8 | 51.8 | 46.8 | 45.7 |
Bicycle Wheel | 77.8 | 76.9 | 81.3 | 81.2 |
Bicycle Body | 48.4 | 51.1 | 51.5 | 53.0 |
Bird Head | 64.6 | 69.9 | 71.1 | 73.7 |
Bird Wing | 34.1 | 40.6 | 38.6 | 41.6 |
Bird Leg | 28.9 | 34.7 | 28.7 | 30.1 |
Bird Torso | 65.5 | 71.4 | 69.5 | 69.2 |
Boat | 54.4 | 60.4 | 70.0 | 75.3 |
Bottle Cap | 32.7 | 31.5 | 33.9 | 31.9 |
Bottle Body | 68.8 | 73.7 | 77.6 | 73.8 |
Bus Window | 72.7 | 74.9 | 75.4 | 76.7 |
Bus Wheel | 55.3 | 56.1 | 58.1 | 61.0 |
Bus Body | 74.8 | 77.5 | 79.9 | 80.3 |
Car Window | 63.6 | 68.5 | 64.8 | 70.7 |
Car Wheel | 64.8 | 69.1 | 70.3 | 73.7 |
Car Light | 46.2 | 54.0 | 48.4 | 54.6 |
Car Plate | 0.0 | 0.0 | 0.0 | 0.0 |
Car Body | 72.1 | 77.4 | 77.6 | 79.0 |
Cat Head | 80.2 | 84.0 | 83.8 | 85.7 |
Cat Leg | 48.6 | 49.8 | 49.4 | 51.7 |
Cat Tail | 41.2 | 45.8 | 46.0 | 45.5 |
Cat Torso | 70.3 | 72.3 | 73.8 | 73.6 |
Chair | 35.4 | 35.6 | 51.4 | 59.6 |
Cow Head | 74.3 | 78.7 | 80.7 | 80.9 |
Cow Tail | 0.0 | 0.6 | 8.1 | 18.3 |
Cow Leg | 46.1 | 54.2 | 53.5 | 57.0 |
Cow Torso | 67.9 | 76.4 | 77.1 | 76.7 |
Dining Table | 43.0 | 43.1 | 51.3 | 58.2 |
Dog Head | 78.7 | 85.1 | 85.0 | 84.3 |
Dog Leg | 48.1 | 54.4 | 53.8 | 53.8 |
Dog Tail | 27.1 | 33.6 | 31.4 | 37.3 |
Dog Torso | 63.6 | 67.3 | 68.0 | 67.3 |
Horse Head | 74.7 | 77.2 | 73.9 | 81.6 |
Horse Tail | 47.0 | 52.0 | 50.4 | 52.2 |
Horse Leg | 55.2 | 60.9 | 59.3 | 60.3 |
Horse Torso | 71.3 | 74.6 | 73.9 | 77.2 |
Motorbike Wheel | 72.9 | 72.3 | 73.5 | 76.3 |
Motorbike Body | 64.1 | 73.2 | 74.3 | 75.0 |
Person Head | 82.5 | 84.9 | 84.7 | 84.2 |
Person Torso | 65.3 | 67.5 | 67.0 | 68.6 |
Person Lower Arm | 46.9 | 51.1 | 48.6 | 51.8 |
Person Upper Arm | 51.5 | 52.7 | 52.4 | 54.6 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Person Lower Leg | 38.6 | 42.0 | 40.2 | 42.4 |
Person Upper Leg | 43.8 | 46.3 | 44.5 | 47.4 |
Potted Plant Pot | 47.3 | 51.3 | 56.0 | 50.8 |
Potted Plant Plant | 52.4 | 55.5 | 56.4 | 58.9 |
Sheep Head | 60.9 | 63.6 | 70.8 | 70.6 |
Sheep Leg | 8.6 | 19.4 | 14.3 | 24.4 |
Sheep Torso | 68.3 | 71.7 | 75.6 | 76.0 |
Sofa | 43.2 | 42.6 | 56.1 | 69.1 |
Train | 76.6 | 80.9 | 85.0 | 86.0 |
TV Screen | 69.5 | 72.3 | 77.0 | 72.0 |
TV Frame | 44.4 | 49.0 | 54.1 | 47.3 |
f.2 Pascal-Part-58 sqIOU comparison
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Background | 89.6 | 90.2 | 91.0 | 91.2 |
Aeroplane Body | 60.9 | 63.9 | 62.2 | 64.2 |
Aeroplane Engine | 22.9 | 39.4 | 34.1 | 38.6 |
Aeroplane Wing | 30.8 | 35.5 | 30.2 | 35.1 |
Aeroplane Stern | 48.2 | 49.2 | 48.9 | 50.5 |
Aeroplane Wheel | 30.5 | 32.7 | 27.7 | 39.2 |
Bicycle Wheel | 68.7 | 66.8 | 68.4 | 72.9 |
Bicycle Body | 38.6 | 40.5 | 38.4 | 43.9 |
Bird Head | 48.1 | 54.6 | 52.1 | 58.2 |
Bird Wing | 28.9 | 32.0 | 35.0 | 39.1 |
Bird Leg | 15.7 | 19.4 | 15.1 | 21.2 |
Bird Torso | 54.8 | 59.1 | 57.7 | 59.5 |
Boat | 53.3 | 59.4 | 60.7 | 64.3 |
Bottle Cap | 16.0 | 17.9 | 18.7 | 24.7 |
Bottle Body | 43.1 | 45.5 | 48.6 | 50.1 |
Bus Window | 68.1 | 70.3 | 69.5 | 72.2 |
Bus Wheel | 48.7 | 46.6 | 49.6 | 55.0 |
Bus Body | 71.4 | 73.0 | 74.1 | 75.5 |
Car Window | 45.1 | 51.7 | 46.8 | 60.5 |
Car Wheel | 45.9 | 47.6 | 47.1 | 58.1 |
Car Light | 24.0 | 28.7 | 23.7 | 32.4 |
Car Plate | 0.0 | 0.0 | 0.0 | 0.0 |
Car Body | 59.2 | 61.5 | 61.7 | 67.7 |
Cat Head | 76.3 | 78.7 | 78.0 | 81.4 |
Cat Leg | 45.0 | 47.1 | 46.4 | 49.5 |
Cat Tail | 31.3 | 36.4 | 36.4 | 37.5 |
Cat Torso | 66.9 | 68.6 | 69.9 | 70.8 |
Chair | 34.4 | 37.5 | 48.3 | 50.8 |
Cow Head | 58.7 | 65.3 | 65.2 | 69.7 |
Cow Tail | 0.0 | 0.5 | 3.4 | 16.1 |
Cow Leg | 38.1 | 42.7 | 42.6 | 52.1 |
Cow Torso | 63.2 | 72.7 | 73.1 | 74.1 |
Dining Table | 38.4 | 36.8 | 43.2 | 47.6 |
Dog Head | 71.0 | 76.2 | 74.9 | 78.8 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Dog Leg | 42.5 | 50.3 | 47.5 | 52.9 |
Dog Tail | 22.7 | 25.5 | 27.7 | 36.2 |
Dog Torso | 58.3 | 62.3 | 62.2 | 63.9 |
Horse Head | 61.3 | 63.7 | 63.9 | 70.7 |
Horse Tail | 30.8 | 33.6 | 34.1 | 39.7 |
Horse Leg | 47.7 | 52.4 | 50.7 | 56.3 |
Horse Torso | 68.5 | 70.0 | 71.9 | 74.2 |
Motorbike Wheel | 60.0 | 63.0 | 62.3 | 68.0 |
Motorbike Body | 60.2 | 64.6 | 63.6 | 66.3 |
Person Head | 69.0 | 69.7 | 69.8 | 74.1 |
Person Torso | 55.2 | 57.2 | 56.0 | 61.1 |
Person Lower Arm | 33.9 | 39.5 | 35.5 | 45.2 |
Person Upper Arm | 42.0 | 43.7 | 42.2 | 49.8 |
Person Lower Leg | 31.6 | 33.0 | 32.2 | 37.5 |
Person Upper Leg | 38.2 | 39.9 | 38.7 | 43.9 |
Potted Plant Pot | 29.2 | 28.5 | 32.6 | 32.8 |
Potted Plant Plant | 33.2 | 33.8 | 37.3 | 39.3 |
Sheep Head | 48.1 | 55.3 | 56.4 | 61.5 |
Sheep Leg | 6.0 | 10.0 | 8.1 | 21.1 |
Sheep Torso | 67.2 | 71.2 | 75.4 | 74.9 |
Sofa | 42.6 | 49.6 | 58.1 | 69.2 |
Train | 75.8 | 80.0 | 83.4 | 82.1 |
TV Screen | 65.4 | 68.9 | 70.4 | 70.8 |
TV Frame | 41.0 | 44.6 | 44.6 | 46.8 |
f.3 Pascal-Part-108 mIOU comparison
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Background | 90.2 | 91.8 | 92.7 | 92.9 |
Aeroplane Body | 60.9 | 69.6 | 61.9 | 70.0 |
Aeroplane Engine | 53.2 | 56.2 | 27.2 | 57.6 |
Aeroplane Wing | 27.9 | 37.2 | 34.3 | 34.3 |
Aeroplane Stern | 24.7 | 29.4 | 57.4 | 27.6 |
Aeroplane Wheel | 40.9 | 51.0 | 51.5 | 45.5 |
Bicycle Wheel | 76.4 | 77.1 | 80.2 | 81.2 |
Bicycle Saddle | 34.1 | 38.3 | 38.0 | 36.9 |
Bicycle Handlebar | 23.3 | 25.2 | 22.4 | 18.8 |
Bicycle Chainwheel | 42.3 | 41.6 | 44.1 | 53.6 |
Bird Head | 51.5 | 66.6 | 65.3 | 70.0 |
Bird Beak | 40.4 | 51.3 | 44.3 | 56.0 |
Bird Torso | 61.7 | 67.3 | 64.8 | 63.4 |
Bird Neck | 27.5 | 34.5 | 28.4 | 34.3 |
Bird Wing | 35.9 | 41.3 | 37.2 | 36.2 |
Bird Leg | 23.5 | 30.8 | 23.8 | 25.5 |
Bird Foot | 13.9 | 18.3 | 17.7 | 17.4 |
Bird Tail | 28.1 | 35.7 | 32.5 | 33.6 |
Boat | 53.7 | 60.7 | 69.2 | 74.8 |
Bottle Cap | 30.4 | 31.0 | 33.4 | 37.6 |
Bottle Body | 63.7 | 71.4 | 78.7 | 72.4 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Bus Side | 70.1 | 74.9 | 75.7 | 77.1 |
Bus Roof | 7.5 | 6.3 | 13.5 | 17.1 |
Bus Mirror | 2.1 | 8.0 | 6.6 | 0.0 |
Bus Plate | 0.0 | 0.0 | 0.0 | 0.0 |
Bus Door | 40.1 | 47.4 | 38.1 | 40.6 |
Bus Wheel | 53.8 | 56.6 | 56.7 | 59.8 |
Bus Headlight | 25.6 | 31.0 | 30.4 | 44.7 |
Bus Window | 71.8 | 74.7 | 74.6 | 75.7 |
Car Side | 64.0 | 70.2 | 70.5 | 70.8 |
Car Roof | 21.0 | 22.2 | 22.5 | 27.1 |
Car Plate | 0.0 | 0.0 | 0.0 | 0.0 |
Car Door | 40.1 | 42.1 | 42.3 | 46.8 |
Car Wheel | 65.8 | 68.8 | 70.2 | 72.7 |
Car Headlight | 42.9 | 53.8 | 46.4 | 52.5 |
Car Window | 61.0 | 69.1 | 65.0 | 69.4 |
Cat Head | 73.9 | 76.7 | 77.5 | 77.6 |
Cat Eye | 58.8 | 64.8 | 62.8 | 67.8 |
Cat Ear | 65.5 | 67.9 | 67.1 | 67.8 |
Cat Nose | 39.1 | 46.6 | 46.3 | 44.7 |
Cat Torso | 64.2 | 66.9 | 68.7 | 68.0 |
Cat Neck | 22.8 | 22.4 | 24.4 | 26.2 |
Cat Leg | 36.5 | 39.6 | 39.1 | 40.3 |
Cat Paw | 40.6 | 42.0 | 41.7 | 43.2 |
Cat Tail | 40.2 | 44.5 | 45.8 | 43.9 |
Chair | 35.4 | 35.7 | 49.1 | 59.4 |
Cow Head | 51.2 | 62.5 | 63.8 | 64.7 |
Cow Ear | 51.2 | 57.8 | 60.0 | 60.9 |
Cow Muzzle | 61.2 | 72.4 | 74.9 | 70.6 |
Cow Horn | 28.8 | 45.5 | 44.0 | 34.9 |
Cow Torso | 63.4 | 73.5 | 73.2 | 72.6 |
Cow Neck | 9.5 | 15.9 | 20.3 | 26.8 |
Cow Leg | 46.5 | 54.8 | 54.8 | 54.8 |
Cow Tail | 6.5 | 3.1 | 13.6 | 22.9 |
Dining Table | 33.0 | 45.6 | 50.6 | 58.0 |
Dog Head | 60.5 | 64.7 | 64.0 | 63.3 |
Dog Eye | 50.1 | 57.0 | 54.7 | 60.9 |
Dog Ear | 52.0 | 57.8 | 56.8 | 57.4 |
Dog Nose | 63.5 | 69.8 | 66.0 | 66.7 |
Dog Torso | 58.4 | 62.3 | 63.2 | 62.2 |
Dog Neck | 27.1 | 28.0 | 28.1 | 26.5 |
Dog Leg | 39.2 | 43.2 | 43.7 | 43.1 |
Dog Paw | 39.4 | 45.2 | 43.7 | 47.8 |
Dog Tail | 24.7 | 35.0 | 30.8 | 31.0 |
Dog Muzzle | 65.1 | 70.1 | 68.9 | 67.0 |
Horse Head | 54.4 | 59.9 | 55.9 | 62.4 |
Horse Ear | 49.7 | 56.8 | 52.2 | 59.2 |
Horse Muzzle | 61.3 | 66.6 | 62.9 | 65.3 |
Horse Torso | 56.7 | 61.1 | 60.7 | 63.1 |
Horse Neck | 42.1 | 44.8 | 47.2 | 49.3 |
Horse Leg | 54.1 | 59.3 | 56.4 | 58.0 |
Horse Tail | 48.1 | 51.9 | 51.4 | 53.4 |
Horse Hoof | 22.1 | 19.8 | 25.3 | 18.2 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Motorbike Wheel | 69.6 | 71.6 | 73.6 | 76.4 |
Motorbike Hbar | 0.0 | 0.0 | 0.0 | 0.0 |
Motorbike Saddle | 0.0 | 0.0 | 0.8 | 0.1 |
Motorbike Hlight | 25.8 | 21.3 | 28.5 | 35.0 |
Person Head | 68.2 | 71.3 | 69.3 | 72.6 |
Person Eye | 35.1 | 44.6 | 38.7 | 49.6 |
Person Ear | 37.4 | 46.4 | 41.4 | 47.6 |
Person Nose | 53.0 | 57.4 | 56.7 | 62.4 |
Person Mouth | 48.9 | 53.1 | 51.3 | 58.4 |
Person Hair | 70.8 | 73.2 | 71.8 | 70.9 |
Person Torso | 63.4 | 66.3 | 65.2 | 66.1 |
Person Neck | 49.7 | 53.1 | 51.2 | 54.5 |
Person Arm | 54.7 | 58.4 | 57.4 | 58.3 |
Person Hand | 43.0 | 50.1 | 44.1 | 47.8 |
Person Leg | 50.8 | 53.8 | 53.0 | 53.6 |
Person Foot | 29.8 | 33.0 | 31.3 | 31.8 |
Potted Plant Pot | 41.6 | 52.3 | 56.0 | 50.1 |
Potted Plant Plant | 42.9 | 56.1 | 56.6 | 47.7 |
Sheep Head | 45.6 | 50.2 | 54.0 | 51.6 |
Sheep Ear | 43.2 | 48.9 | 45.3 | 54.8 |
Sheep Muzzle | 58.2 | 66.6 | 64.9 | 65.5 |
Sheep Horn | 3.0 | 5.1 | 5.4 | 31.8 |
Sheep Torso | 62.6 | 66.3 | 68.8 | 69.9 |
Sheep Neck | 26.9 | 29.8 | 30.3 | 36.0 |
Sheep Leg | 8.6 | 21.1 | 11.7 | 23.9 |
Sheep Tail | 6.7 | 6.3 | 9.1 | 15.2 |
Sofa | 39.2 | 43.0 | 53.9 | 68.9 |
Train Head | 5.3 | 6.0 | 4.5 | 4.0 |
Train Head Side | 61.9 | 60.8 | 60.8 | 66.6 |
Train Head Roof | 23.0 | 19.9 | 21.1 | 26.5 |
Train Headlight | 0.0 | 0.0 | 0.0 | 0.0 |
Train Coach | 28.6 | 35.7 | 31.4 | 36.4 |
Train Coach Side | 15.6 | 18.4 | 14.9 | 15.5 |
Train Coach Roof | 10.8 | 6.3 | 18.1 | 7.7 |
TV Screen | 64.8 | 70.4 | 70.7 | 69.6 |
f.4 Pascal-Part-108 sqIOU comparison
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Background | 88.7 | 90.6 | 91.2 | 91.3 |
Aeroplane Body | 55.1 | 63.9 | 62.3 | 64.5 |
Aeroplane Engine | 47.1 | 49.1 | 32.5 | 50.2 |
Aeroplane Wing | 22.9 | 35.5 | 30.0 | 31.0 |
Aeroplane Stern | 29.7 | 40.2 | 50.1 | 34.5 |
Aeroplane Wheel | 25.9 | 33.2 | 29.4 | 35.1 |
Bicycle Wheel | 63.1 | 67.3 | 68.4 | 71.2 |
Bicycle Saddle | 23.4 | 27.0 | 26.0 | 28.2 |
Bicycle Handlebar | 15.7 | 18.9 | 16.3 | 16.7 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Bicycle Chainwheel | 29.4 | 30.2 | 29.1 | 33.5 |
Bird Head | 42.7 | 47.6 | 45.8 | 52.4 |
Bird Beak | 22.8 | 30.8 | 25.9 | 37.5 |
Bird Torso | 49.6 | 57.0 | 54.2 | 54.7 |
Bird Neck | 23.7 | 30.2 | 26.0 | 33.1 |
Bird Wing | 31.2 | 31.5 | 33.2 | 36.4 |
Bird Leg | 8.2 | 14.6 | 10.0 | 14.3 |
Bird Foot | 7.7 | 10.8 | 9.7 | 12.0 |
Bird Tail | 21.6 | 25.2 | 25.7 | 28.0 |
Boat | 53.5 | 59.2 | 62.3 | 64.0 |
Bottle Cap | 15.0 | 18.0 | 17.8 | 25.4 |
Bottle Body | 40.0 | 45.1 | 49.1 | 49.3 |
Bus Side | 64.2 | 70.5 | 70.2 | 72.3 |
Bus Roof | 7.7 | 9.8 | 17.2 | 24.2 |
Bus Mirror | 0.6 | 4.7 | 2.8 | 0.0 |
Bus Plate | 0.0 | 0.0 | 0.0 | 0.0 |
Bus Door | 27.5 | 33.7 | 28.7 | 32.4 |
Bus Wheel | 43.2 | 48.1 | 48.1 | 52.9 |
Bus Headlight | 12.2 | 23.4 | 16.8 | 30.5 |
Bus Window | 67.1 | 70.6 | 69.2 | 71.7 |
Car Side | 51.8 | 54.9 | 55.5 | 58.7 |
Car Roof | 13.0 | 20.7 | 15.3 | 25.8 |
Car Plate | 0.0 | 0.0 | 0.0 | 0.0 |
Car Door | 34.1 | 37.3 | 40.8 | 46.8 |
Car Wheel | 44.2 | 46.9 | 47.4 | 57.1 |
Car Headlight | 21.2 | 28.5 | 25.5 | 31.4 |
Car Window | 45.0 | 52.6 | 47.7 | 58.4 |
Cat Head | 67.6 | 70.9 | 70.5 | 72.4 |
Cat Eye | 30.9 | 40.4 | 35.0 | 46.6 |
Cat Ear | 54.6 | 58.2 | 56.1 | 60.9 |
Cat Nose | 13.9 | 25.3 | 23.6 | 29.2 |
Cat Torso | 60.4 | 63.4 | 65.0 | 64.6 |
Cat Neck | 23.0 | 22.1 | 24.8 | 28.0 |
Cat Leg | 33.6 | 36.3 | 36.2 | 38.5 |
Cat Paw | 36.9 | 39.5 | 38.8 | 41.5 |
Cat Tail | 32.6 | 36.7 | 36.5 | 35.3 |
Chair | 37.4 | 37.4 | 46.8 | 50.6 |
Cow Head | 46.4 | 51.8 | 52.7 | 56.6 |
Cow Ear | 36.8 | 42.3 | 41.3 | 47.3 |
Cow Muzzle | 50.0 | 60.4 | 60.2 | 63.4 |
Cow Horn | 16.9 | 22.4 | 23.4 | 23.5 |
Cow Torso | 60.8 | 69.6 | 71.8 | 71.3 |
Cow Neck | 9.6 | 13.3 | 15.2 | 23.8 |
Cow Leg | 35.7 | 43.5 | 44.0 | 49.3 |
Cow Tail | 3.7 | 2.2 | 6.9 | 15.7 |
Dining Table | 31.5 | 39.2 | 44.1 | 47.5 |
Dog Head | 54.0 | 58.6 | 56.6 | 60.0 |
Dog Eye | 23.0 | 31.6 | 26.9 | 39.0 |
Dog Ear | 40.1 | 50.0 | 47.4 | 54.5 |
Dog Nose | 34.5 | 45.2 | 37.5 | 48.8 |
Dog Torso | 53.6 | 57.9 | 57.8 | 59.4 |
Dog Neck | 20.3 | 20.8 | 21.2 | 25.3 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Dog Leg | 33.6 | 40.0 | 37.5 | 41.9 |
Dog Paw | 32.8 | 39.8 | 35.9 | 41.7 |
Dog Tail | 22.6 | 28.1 | 27.4 | 33.2 |
Dog Muzzle | 53.8 | 60.5 | 57.8 | 62.7 |
Horse Head | 46.6 | 49.9 | 49.3 | 54.2 |
Horse Ear | 29.3 | 38.2 | 35.1 | 44.4 |
Horse Muzzle | 54.1 | 54.7 | 55.6 | 61.6 |
Horse Torso | 55.7 | 58.0 | 61.0 | 60.6 |
Horse Neck | 46.1 | 45.9 | 51.4 | 52.7 |
Horse Leg | 46.1 | 51.7 | 48.9 | 53.3 |
Horse Tail | 33.0 | 34.5 | 36.4 | 40.4 |
Horse Hoof | 12.1 | 14.0 | 16.3 | 13.3 |
Motorbike Wheel | 58.8 | 61.7 | 62.1 | 66.3 |
Motorbike Hbar | 0.0 | 0.0 | 0.0 | 0.0 |
Motorbike Saddle | 0.0 | 0.0 | 0.5 | 0.1 |
Motorbike Hlight | 18.3 | 16.9 | 17.4 | 21.7 |
Person Head | 50.5 | 54.0 | 52.4 | 58.5 |
Person Eye | 9.4 | 15.7 | 11.0 | 20.0 |
Person Ear | 14.7 | 22.9 | 17.7 | 26.0 |
Person Nose | 22.6 | 26.8 | 26.0 | 35.9 |
Person Mouth | 19.7 | 24.5 | 21.9 | 31.8 |
Person Hair | 50.9 | 56.2 | 52.3 | 58.8 |
Person Torso | 52.7 | 56.1 | 54.8 | 58.6 |
Person Neck | 36.1 | 38.7 | 37.1 | 46.1 |
Person Arm | 43.5 | 48.6 | 46.5 | 53.0 |
Person Hand | 27.3 | 34.0 | 29.0 | 38.2 |
Person Leg | 43.5 | 46.2 | 45.9 | 49.1 |
Person Foot | 20.6 | 23.3 | 22.7 | 26.6 |
Potted Plant Pot | 23.1 | 29.5 | 31.6 | 31.9 |
Potted Plant Plant | 29.1 | 34.3 | 37.8 | 35.1 |
Sheep Head | 37.0 | 41.7 | 42.3 | 45.8 |
Sheep Ear | 22.2 | 29.1 | 25.1 | 35.3 |
Sheep Muzzle | 35.0 | 40.4 | 40.9 | 50.4 |
Sheep Horn | 1.7 | 2.1 | 2.1 | 17.5 |
Sheep Torso | 63.7 | 66.5 | 70.2 | 69.0 |
Sheep Neck | 24.3 | 28.3 | 26.5 | 36.0 |
Sheep Leg | 4.8 | 12.2 | 6.4 | 30.1 |
Sheep Tail | 4.1 | 5.3 | 4.6 | 14.6 |
Sofa | 47.2 | 48.9 | 59.0 | 68.9 |
Train Head | 3.2 | 3.7 | 3.0 | 2.6 |
Train Head Side | 65.3 | 68.3 | 69.2 | 70.9 |
Train Head Roof | 16.6 | 18.8 | 20.0 | 22.8 |
Train Headlight | 0.0 | 0.0 | 0.0 | 0.0 |
Train Coach | 11.7 | 13.6 | 14.8 | 12.6 |
Train Coach Side | 25.5 | 29.2 | 29.8 | 29.7 |
Train Coach Roof | 9.3 | 6.2 | 14.7 | 8.6 |
TV Screen | 60.6 | 65.3 | 68.4 | 68.2 |
f.5 Pascal-Part-201 mIOU comparison
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Background | 91.0 | 91.2 | 90.7 | 92.5 |
Aeroplane Body | 67.3 | 71.1 | 62.2 | 68.9 |
Aeroplane Engine | 27.0 | 30.0 | 19.2 | 28.5 |
Aeroplane Left Wing | 3.8 | 10.7 | 7.2 | 28.6 |
Aeroplane Right Wing | 19.6 | 20.3 | 13.4 | 25.6 |
Aeroplane Stern | 53.3 | 56.7 | 48.1 | 55.3 |
Aeroplane Tail | 0.0 | 0.0 | 0.0 | 0.0 |
Aeroplane Wheel | 50.4 | 52.9 | 36.1 | 49.9 |
Bicycle Back Wheel | 63.8 | 63.6 | 55.6 | 67.9 |
Bicycle Chainwheel | 41.2 | 44.9 | 35.1 | 42.0 |
Bicycle Body | 44.6 | 44.9 | 40.1 | 47.3 |
Bicycle Front Wheel | 68.4 | 70.9 | 61.7 | 72.9 |
Bicycle Handlebar | 27.1 | 26.1 | 18.1 | 24.9 |
Bicycle Headlight | 0.0 | 0.0 | 0.0 | 0.0 |
Bicycle Saddle | 41.1 | 41.6 | 20.9 | 43.5 |
Bird Beak | 53.0 | 57.3 | 37.2 | 49.3 |
Bird Head | 66.5 | 66.4 | 54.3 | 66.5 |
Bird Left Eye | 26.2 | 27.6 | 17.9 | 57.8 |
Bird Left Foot | 5.9 | 12.0 | 2.2 | 9.5 |
Bird Left Leg | 5.1 | 9.3 | 4.8 | 15.9 |
Bird Left Wing | 4.2 | 11.9 | 8.8 | 29.4 |
Bird Neck | 34.0 | 35.8 | 31.7 | 34.4 |
Bird Right Eye | 0.0 | 11.6 | 0.9 | 55.2 |
Bird Right Foot | 0.0 | 1.2 | 0.0 | 7.4 |
Bird Right Leg | 11.1 | 11.1 | 14.6 | 11.2 |
Bird Right Wing | 18.7 | 16.3 | 18.1 | 20.3 |
Bird Tail | 30.0 | 36.2 | 26.1 | 29.5 |
Bird Torso | 60.6 | 65.3 | 61.1 | 61.2 |
Boat | 56.7 | 61.2 | 55.0 | 75.3 |
Bottle Body | 64.6 | 72.5 | 65.5 | 67.6 |
Bottle Cap | 28.1 | 30.9 | 21.4 | 35.1 |
Bus Back Plate | 0.0 | 0.0 | 0.0 | 13.0 |
Bus Back Side | 49.0 | 44.1 | 49.8 | 43.5 |
Bus Door | 40.9 | 46.1 | 31.1 | 38.2 |
Bus Front Plate | 26.3 | 42.2 | 0.0 | 45.3 |
Bus Front Side | 68.9 | 66.9 | 60.9 | 48.6 |
Bus Headlight | 32.6 | 34.8 | 6.1 | 38.8 |
Bus Left Mirror | 0.0 | 0.8 | 0.0 | 7.5 |
Bus Left Side | 21.4 | 25.1 | 27.1 | 34.6 |
Bus Right Mirror | 0.0 | 12.5 | 0.0 | 9.7 |
Bus Right Side | 33.9 | 31.5 | 29.2 | 36.7 |
Bus Roof | 0.0 | 8.0 | 1.0 | 13.5 |
Bus Wheel | 57.1 | 56.2 | 48.8 | 59.3 |
Bus Window | 73.5 | 74.8 | 66.4 | 76.5 |
Car Back Plate | 25.6 | 26.9 | 6.7 | 39.2 |
Car Back Side | 45.0 | 44.5 | 38.0 | 44.6 |
Car Door | 41.4 | 44.1 | 37.8 | 43.6 |
Car Front Plate | 43.0 | 38.1 | 12.5 | 48.6 |
Car Front Side | 66.0 | 65.6 | 60.1 | 56.1 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Car Headlight | 54.4 | 51.5 | 39.4 | 54.3 |
Car Left Mirror | 12.6 | 14.0 | 1.4 | 21.0 |
Car Left Side | 20.5 | 20.5 | 16.6 | 28.7 |
Car Right Mirror | 0.3 | 7.1 | 0.0 | 17.8 |
Car Right Side | 16.7 | 18.7 | 14.6 | 29.0 |
Car Roof | 17.4 | 27.8 | 11.7 | 22.5 |
Car Wheel | 68.3 | 69.2 | 63.7 | 72.4 |
Car Window | 65.4 | 67.1 | 55.4 | 68.4 |
Cat Head | 75.4 | 76.4 | 72.3 | 77.8 |
Cat Left Back Leg | 9.7 | 9.9 | 6.7 | 11.6 |
Cat Left Back Paw | 9.3 | 10.8 | 4.1 | 10.7 |
Cat Left Ear | 12.6 | 22.3 | 24.2 | 59.7 |
Cat Left Eye | 7.9 | 11.6 | 13.4 | 59.2 |
Cat Left Front Leg | 11.5 | 15.0 | 14.8 | 25.3 |
Cat Left Front Paw | 15.3 | 16.7 | 13.7 | 19.0 |
Cat Neck | 21.5 | 18.5 | 19.6 | 23.3 |
Cat Nose | 39.7 | 46.3 | 32.9 | 43.1 |
Cat Right Back Leg | 1.5 | 10.1 | 6.8 | 16.0 |
Cat Right Back Paw | 0.2 | 7.5 | 7.1 | 16.1 |
Cat Right Ear | 33.2 | 28.6 | 25.1 | 59.7 |
Cat Right Eye | 34.0 | 33.8 | 23.5 | 62.8 |
Cat Right Front Leg | 16.2 | 12.1 | 12.0 | 26.5 |
Cat Right Front Paw | 17.6 | 12.1 | 12.2 | 21.9 |
Cat Tail | 40.6 | 45.4 | 15.3 | 43.0 |
Cat Torso | 65.6 | 66.7 | 64.9 | 67.6 |
Chair | 35.6 | 35.4 | 35.5 | 59.6 |
Cow Head | 60.1 | 60.6 | 54.7 | 61.1 |
Cow Left Back Lower Leg | 0.8 | 3.3 | 1.1 | 15.3 |
Cow Left Back Upper Leg | 13.5 | 16.2 | 12.7 | 19.6 |
Cow Left Ear | 1.9 | 24.2 | 8.2 | 53.0 |
Cow Left Eye | 0.0 | 0.0 | 0.0 | 41.1 |
Cow Left Front Lower Leg | 15.9 | 14.5 | 12.1 | 25.4 |
Cow Left Front Upper Leg | 14.4 | 18.6 | 14.4 | 33.2 |
Cow Left Horn | 0.0 | 13.3 | 0.0 | 28.3 |
Cow Muzzle | 71.0 | 72.1 | 64.7 | 70.4 |
Cow Neck | 5.7 | 15.1 | 9.0 | 21.9 |
Cow Right Back Lower Leg | 16.3 | 18.4 | 1.7 | 17.4 |
Cow Right Back Upper Leg | 5.6 | 12.9 | 5.2 | 22.9 |
Cow Right Ear | 27.4 | 28.8 | 22.1 | 56.5 |
Cow Right Eye | 1.9 | 11.0 | 0.0 | 38.5 |
Cow Right Front Lower Leg | 2.6 | 9.5 | 0.5 | 25.1 |
Cow Right Front Upper Leg | 19.2 | 21.8 | 14.4 | 32.7 |
Cow Right Horn | 0.0 | 30.7 | 2.9 | 24.1 |
Cow Tail | 5.6 | 12.2 | 0.0 | 17.8 |
Cow Torso | 70.0 | 73.0 | 63.1 | 70.8 |
Dining Table | 38.6 | 43.6 | 40.3 | 58.2 |
Dog Head | 61.7 | 63.4 | 58.9 | 62.7 |
Dog Left Back Leg | 5.0 | 8.6 | 6.2 | 17.1 |
Dog Left Back Paw | 6.8 | 6.7 | 3.4 | 12.6 |
Dog Left Ear | 22.1 | 19.5 | 19.9 | 56.6 |
Dog Left Eye | 21.0 | 21.4 | 12.6 | 54.9 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Dog Left Front Leg | 9.4 | 18.6 | 14.2 | 32.3 |
Dog Left Front Paw | 8.2 | 16.6 | 16.2 | 30.2 |
Dog Muzzle | 67.7 | 70.3 | 63.2 | 64.9 |
Dog Neck | 27.5 | 27.4 | 20.1 | 25.6 |
Dog Nose | 64.0 | 69.2 | 55.1 | 66.0 |
Dog Right Back Leg | 18.0 | 12.3 | 13.5 | 24.4 |
Dog Right Back Paw | 2.5 | 5.5 | 2.5 | 18.1 |
Dog Right Ear | 20.6 | 23.5 | 26.6 | 53.0 |
Dog Right Eye | 20.7 | 17.5 | 17.5 | 59.8 |
Dog Right Front Leg | 20.6 | 14.1 | 17.1 | 33.4 |
Dog Right Front Paw | 23.5 | 18.7 | 17.2 | 31.8 |
Dog Tail | 31.8 | 35.5 | 27.5 | 33.5 |
Dog Torso | 60.3 | 62.3 | 58.8 | 62.0 |
Horse Head | 58.0 | 58.2 | 48.8 | 63.5 |
Horse Left Back Hoof | 0.0 | 1.7 | 1.2 | 8.2 |
Horse Left Back Lower Leg | 4.4 | 8.0 | 3.1 | 19.6 |
Horse Left Back Upper Leg | 0.4 | 10.0 | 14.0 | 24.0 |
Horse Left Ear | 7.0 | 13.5 | 10.2 | 50.1 |
Horse Left Eye | 0.0 | 0.0 | 3.2 | 39.8 |
Horse Left Front Hoof | 0.0 | 3.9 | 0.0 | 2.1 |
Horse Left Front Lower Leg | 15.5 | 20.1 | 11.2 | 23.3 |
Horse Left Front Upper Leg | 14.2 | 24.9 | 14.4 | 30.1 |
Horse Muzzle | 65.0 | 66.4 | 56.0 | 69.7 |
Horse Neck | 50.8 | 48.9 | 38.6 | 51.5 |
Horse Right Back Hoof | 0.0 | 2.6 | 2.1 | 7.7 |
Horse Right Back Lower Leg | 16.1 | 19.6 | 7.2 | 21.2 |
Horse Right Back Upper Leg | 22.4 | 23.9 | 14.3 | 28.6 |
Horse Right Ear | 29.3 | 25.7 | 28.1 | 49.7 |
Horse Right Eye | 17.2 | 19.4 | 1.2 | 52.1 |
Horse Right Front Hoof | 0.0 | 2.2 | 0.0 | 2.9 |
Horse Right Front Lower Leg | 4.1 | 9.6 | 5.3 | 21.7 |
Horse Right Front Upper Leg | 21.5 | 12.6 | 13.2 | 33.3 |
Horse Tail | 47.9 | 49.9 | 39.0 | 49.6 |
Horse Torso | 61.3 | 61.7 | 56.4 | 65.1 |
Motorbike Back Wheel | 60.7 | 63.9 | 52.3 | 63.7 |
Motorbike Body | 67.8 | 70.7 | 64.5 | 70.8 |
Motorbike Front Wheel | 68.9 | 72.2 | 63.4 | 71.9 |
Motorbike Handlebar | 0.0 | 0.0 | 0.0 | 0.1 |
Motorbike Headlight | 30.7 | 17.8 | 11.2 | 34.7 |
Motorbike Saddle | 0.0 | 0.0 | 0.0 | 0.0 |
Person Hair | 73.8 | 74.0 | 68.3 | 72.7 |
Person Head | 70.0 | 70.8 | 63.9 | 71.2 |
Person Left Ear | 19.5 | 16.6 | 8.2 | 45.1 |
Person Left Eye | 4.5 | 12.9 | 3.1 | 42.7 |
Person Left Eyebrow | 0.0 | 3.6 | 0.1 | 17.1 |
Person Left Foot | 18.4 | 16.2 | 11.6 | 17.8 |
Person Left Hand | 8.4 | 15.7 | 13.2 | 33.7 |
Person Left Lower Arm | 19.6 | 18.2 | 12.1 | 37.4 |
Person Left Lower Leg | 16.2 | 18.6 | 17.3 | 23.5 |
Person Left Upper Arm | 21.0 | 19.0 | 16.1 | 47.2 |
Person Left Upper Leg | 15.9 | 10.7 | 13.2 | 30.8 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Person Mouth | 52.2 | 53.6 | 30.2 | 57.9 |
Person Neck | 51.3 | 52.2 | 42.6 | 52.9 |
Person Nose | 59.2 | 58.2 | 40.1 | 61.1 |
Person Right Ear | 16.3 | 19.8 | 9.8 | 47.8 |
Person Right Eye | 22.8 | 17.9 | 13.7 | 47.3 |
Person Right Eyebrow | 0.0 | 3.6 | 0.4 | 12.2 |
Person Right Foot | 9.3 | 12.4 | 7.2 | 18.4 |
Person Right Hand | 28.7 | 24.0 | 23.4 | 36.0 |
Person Right Lower Arm | 17.9 | 19.6 | 18.5 | 37.2 |
Person Right Lower Leg | 15.6 | 16.6 | 10.2 | 24.8 |
Person Right Upper Arm | 22.0 | 21.4 | 23.4 | 45.9 |
Person Right Upper Leg | 19.7 | 23.8 | 19.8 | 32.9 |
Person Torso | 64.2 | 65.3 | 58.6 | 65.3 |
Potted Plant Plant | 51.6 | 56.2 | 48.2 | 54.5 |
Potted Plant Pot | 50.0 | 53.3 | 40.2 | 49.9 |
Sheep Head | 49.2 | 45.1 | 42.6 | 47.6 |
Sheep Left Back Lower Leg | 2.3 | 3.6 | 1.5 | 7.7 |
Sheep Left Back Upper Leg | 0.0 | 0.0 | 0.0 | 9.7 |
Sheep Left Ear | 18.3 | 27.2 | 6.9 | 51.9 |
Sheep Left Eye | 0.0 | 0.0 | 1.3 | 37.6 |
Sheep Left Front Lower Leg | 0.0 | 0.0 | 2.7 | 15.7 |
Sheep Left Front Upper Leg | 0.0 | 0.0 | 0.0 | 14.4 |
Sheep Left Horn | 0.0 | 0.7 | 0.0 | 26.1 |
Sheep Muzzle | 59.8 | 61.1 | 58.6 | 66.0 |
Sheep Neck | 24.9 | 23.1 | 17.1 | 32.0 |
Sheep Right Back Lower Leg | 0.0 | 1.3 | 0.8 | 4.9 |
Sheep Right Back Upper Leg | 0.2 | 1.6 | 1.1 | 7.1 |
Sheep Right Ear | 15.5 | 12.9 | 24.1 | 49.5 |
Sheep Right Eye | 8.5 | 15.8 | 1.9 | 37.0 |
Sheep Right Front Lower Leg | 0.4 | 0.0 | 1.7 | 12.7 |
Sheep Right Front Upper Leg | 2.3 | 0.0 | 2.7 | 16.3 |
Sheep Right Horn | 0.0 | 8.4 | 0.0 | 25.0 |
Sheep Tail | 6.8 | 5.1 | 0.1 | 12.3 |
Sheep Torso | 65.1 | 65.5 | 62.5 | 69.2 |
Sofa | 42.1 | 40.4 | 43.3 | 69.0 |
Train Coach Back Side | 0.0 | 3.8 | 5.9 | 11.1 |
Train Coach Front Side | 0.0 | 0.0 | 0.0 | 0.5 |
Train Coach Left Side | 5.9 | 6.5 | 6.1 | 6.1 |
Train Coach Right Side | 3.4 | 10.2 | 4.9 | 9.1 |
Train Coach Roof | 0.0 | 9.1 | 1.6 | 0.0 |
Train Coach | 30.7 | 35.2 | 33.5 | 28.0 |
Train Head | 4.3 | 9.0 | 5.7 | 4.4 |
Train Head Back Side | 0.0 | 0.0 | 0.0 | 1.3 |
Train Head Front Side | 71.0 | 72.6 | 62.6 | 34.5 |
Train Head Left Side | 19.3 | 16.0 | 22.8 | 27.2 |
Train Head Right Side | 14.3 | 19.8 | 18.4 | 22.2 |
Train Head Roof | 18.7 | 25.2 | 11.4 | 22.2 |
Train Headlight | 23.1 | 24.1 | 9.1 | 29.5 |
TV Monitor Frame | 46.8 | 47.2 | 40.4 | 44.7 |
TV Monitor Screen | 68.5 | 71.5 | 63.9 | 67.4 |
f.6 Pascal-Part-201 sqIOU comparison
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Background | 89.6 | 89.9 | 89.4 | 90.8 |
Aeroplane Body | 61.3 | 65.0 | 54.2 | 62.2 |
Aeroplane Engine | 36.4 | 40.6 | 19.1 | 38.6 |
Aeroplane Left Wing | 3.8 | 10.6 | 3.7 | 22.2 |
Aeroplane Right Wing | 19.6 | 16.1 | 9.1 | 22.7 |
Aeroplane Stern | 47.5 | 49.4 | 39.7 | 46.4 |
Aeroplane Tail | 0.0 | 0.0 | 0.0 | 0.0 |
Aeroplane Wheel | 33.5 | 33.1 | 19.4 | 35.6 |
Bicycle Back Wheel | 53.6 | 53.7 | 43.2 | 59.2 |
Bicycle Chainwheel | 30.3 | 33.6 | 18.6 | 33.2 |
Bicycle Body | 37.6 | 36.7 | 29.6 | 39.9 |
Bicycle Front Wheel | 58.5 | 60.2 | 47.6 | 61.6 |
Bicycle Handlebar | 23.4 | 20.8 | 11.3 | 24.2 |
Bicycle Headlight | 0.0 | 0.0 | 0.0 | 0.0 |
Bicycle Saddle | 32.4 | 29.3 | 14.5 | 32.5 |
Bird Beak | 31.1 | 31.9 | 12.7 | 36.2 |
Bird Head | 49.4 | 49.4 | 40.7 | 52.2 |
Bird Left Eye | 7.4 | 11.4 | 2.4 | 28.3 |
Bird Left Foot | 3.9 | 6.5 | 3.1 | 6.4 |
Bird Left Leg | 2.1 | 4.2 | 0.2 | 8.7 |
Bird Left Wing | 3.9 | 10.5 | 5.2 | 23.4 |
Bird Neck | 24.2 | 26.5 | 14.7 | 27.1 |
Bird Right Eye | 0.0 | 1.0 | 0.1 | 23.1 |
Bird Right Foot | 0.0 | 0.8 | 0.0 | 6.0 |
Bird Right Leg | 4.1 | 4.4 | 8.4 | 6.7 |
Bird Right Wing | 18.3 | 12.9 | 11.7 | 21.4 |
Bird Tail | 24.0 | 25.7 | 17.1 | 27.5 |
Bird Torso | 53.8 | 56.0 | 48.1 | 52.1 |
Boat | 57.4 | 60.2 | 53.1 | 63.9 |
Bottle Body | 46.2 | 45.6 | 39.4 | 47.8 |
Bottle Cap | 18.5 | 17.7 | 12.3 | 24.5 |
Bus Back Plate | 0.0 | 0.0 | 0.0 | 15.5 |
Bus Back Side | 29.5 | 21.1 | 27.6 | 19.9 |
Bus Door | 31.4 | 32.8 | 14.5 | 28.7 |
Bus Front Plate | 17.6 | 32.6 | 0.0 | 39.3 |
Bus Front Side | 66.0 | 67.4 | 58.0 | 46.1 |
Bus Headlight | 20.4 | 28.0 | 1.2 | 25.9 |
Bus Left Mirror | 0.0 | 0.4 | 0.0 | 3.5 |
Bus Left Side | 23.4 | 24.7 | 20.0 | 32.5 |
Bus Right Mirror | 0.0 | 6.3 | 0.0 | 4.0 |
Bus Right Side | 43.1 | 36.5 | 30.3 | 35.1 |
Bus Roof | 0.0 | 10.5 | 1.3 | 19.5 |
Bus Wheel | 51.0 | 49.7 | 38.1 | 53.3 |
Bus Window | 70.4 | 69.7 | 60.6 | 71.6 |
Car Back Plate | 13.8 | 13.6 | 4.9 | 19.7 |
Car Back Side | 22.9 | 24.8 | 16.4 | 25.6 |
Car Door | 45.8 | 40.8 | 38.1 | 44.4 |
Car Front Plate | 21.6 | 19.4 | 4.8 | 28.0 |
Car Front Side | 41.9 | 43.2 | 35.0 | 35.8 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Car Headlight | 30.3 | 30.0 | 13.5 | 31.7 |
Car Left Mirror | 4.9 | 7.5 | 0.0 | 7.7 |
Car Left Side | 19.2 | 18.9 | 15.1 | 24.6 |
Car Right Mirror | 0.2 | 5.0 | 0.0 | 8.1 |
Car Right Side | 17.3 | 17.6 | 10.3 | 25.9 |
Car Roof | 17.3 | 20.1 | 6.8 | 22.9 |
Car Wheel | 50.7 | 49.2 | 38.9 | 56.6 |
Car Window | 52.5 | 52.9 | 40.4 | 57.8 |
Cat Head | 70.7 | 70.6 | 67.4 | 72.4 |
Cat Left Back Leg | 7.2 | 7.5 | 5.0 | 10.0 |
Cat Left Back Paw | 7.6 | 9.4 | 3.5 | 10.7 |
Cat Left Ear | 10.7 | 16.8 | 18.7 | 51.8 |
Cat Left Eye | 5.0 | 10.1 | 11.1 | 42.4 |
Cat Left Front Leg | 10.6 | 14.1 | 11.1 | 27.3 |
Cat Left Front Paw | 15.2 | 17.3 | 11.7 | 21.9 |
Cat Neck | 20.3 | 18.5 | 16.5 | 23.0 |
Cat Nose | 23.7 | 25.9 | 12.8 | 29.7 |
Cat Right Back Leg | 0.9 | 6.8 | 6.8 | 11.5 |
Cat Right Back Paw | 0.1 | 6.8 | 5.6 | 14.5 |
Cat Right Ear | 30.0 | 23.5 | 19.5 | 53.6 |
Cat Right Eye | 18.4 | 15.0 | 11.5 | 41.9 |
Cat Right Front Leg | 13.5 | 9.8 | 10.3 | 26.6 |
Cat Right Front Paw | 13.7 | 8.2 | 11.2 | 23.0 |
Cat Tail | 38.6 | 36.7 | 24.4 | 36.5 |
Cat Torso | 63.1 | 63.2 | 61.4 | 65.0 |
Chair | 39.8 | 37.8 | 38.2 | 50.8 |
Cow Head | 51.6 | 50.1 | 42.8 | 54.4 |
Cow Left Back Lower Leg | 0.9 | 2.0 | 2.3 | 11.4 |
Cow Left Back Upper Leg | 8.8 | 11.2 | 6.5 | 14.5 |
Cow Left Ear | 3.3 | 17.4 | 3.9 | 42.1 |
Cow Left Eye | 0.0 | 0.0 | 0.0 | 19.1 |
Cow Left Front Lower Leg | 13.2 | 9.6 | 7.1 | 21.7 |
Cow Left Front Upper Leg | 9.8 | 12.0 | 10.5 | 26.0 |
Cow Left Horn | 0.0 | 8.2 | 0.0 | 18.7 |
Cow Muzzle | 61.3 | 60.1 | 48.5 | 63.1 |
Cow Neck | 5.6 | 9.6 | 2.6 | 19.9 |
Cow Right Back Lower Leg | 10.2 | 12.8 | 0.8 | 13.9 |
Cow Right Back Upper Leg | 3.6 | 7.2 | 2.9 | 21.0 |
Cow Right Ear | 25.7 | 21.4 | 18.6 | 43.3 |
Cow Right Eye | 0.4 | 4.1 | 0.0 | 15.9 |
Cow Right Front Lower Leg | 2.4 | 6.7 | 0.4 | 22.0 |
Cow Right Front Upper Leg | 12.6 | 12.6 | 5.2 | 26.4 |
Cow Right Horn | 0.0 | 11.1 | 0.4 | 17.2 |
Cow Tail | 2.8 | 7.4 | 0.0 | 13.6 |
Cow Torso | 69.2 | 68.5 | 61.1 | 70.5 |
Dining Table | 34.7 | 38.0 | 35.2 | 47.6 |
Dog Head | 57.9 | 57.6 | 47.7 | 59.4 |
Dog Left Back Leg | 5.3 | 7.1 | 3.4 | 13.9 |
Dog Left Back Paw | 5.8 | 5.6 | 1.7 | 13.2 |
Dog Left Ear | 15.5 | 14.1 | 8.1 | 48.9 |
Dog Left Eye | 10.8 | 9.9 | 4.2 | 36.9 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Dog Left Front Leg | 7.0 | 14.6 | 9.5 | 28.6 |
Dog Left Front Paw | 8.9 | 15.5 | 11.8 | 28.8 |
Dog Muzzle | 61.5 | 60.7 | 49.6 | 62.4 |
Dog Neck | 21.8 | 20.2 | 8.2 | 21.3 |
Dog Nose | 44.6 | 45.0 | 24.7 | 48.0 |
Dog Right Back Leg | 12.2 | 8.3 | 7.9 | 17.8 |
Dog Right Back Paw | 2.5 | 5.6 | 0.3 | 17.3 |
Dog Right Ear | 21.5 | 19.5 | 16.8 | 49.7 |
Dog Right Eye | 9.7 | 9.6 | 4.0 | 38.2 |
Dog Right Front Leg | 16.2 | 9.8 | 10.0 | 31.0 |
Dog Right Front Paw | 19.7 | 13.5 | 6.1 | 31.8 |
Dog Tail | 30.0 | 27.4 | 20.1 | 34.4 |
Dog Torso | 57.1 | 57.4 | 52.9 | 58.9 |
Horse Head | 51.4 | 49.0 | 42.5 | 54.3 |
Horse Left Back Hoof | 0.0 | 1.4 | 2.9 | 6.6 |
Horse Left Back Lower Leg | 3.2 | 5.6 | 4.1 | 13.6 |
Horse Left Back Upper Leg | 0.7 | 8.8 | 7.8 | 18.4 |
Horse Left Ear | 3.7 | 9.9 | 2.8 | 35.1 |
Horse Left Eye | 0.0 | 0.0 | 5.1 | 14.7 |
Horse Left Front Hoof | 0.0 | 2.8 | 0.0 | 2.4 |
Horse Left Front Lower Leg | 11.5 | 15.6 | 6.9 | 21.3 |
Horse Left Front Upper Leg | 11.9 | 21.5 | 9.8 | 29.4 |
Horse Muzzle | 58.3 | 54.4 | 47.3 | 62.2 |
Horse Neck | 50.8 | 45.6 | 40.5 | 50.6 |
Horse Right Back Hoof | 0.0 | 1.2 | 5.9 | 5.1 |
Horse Right Back Lower Leg | 12.7 | 15.0 | 3.6 | 15.6 |
Horse Right Back Upper Leg | 18.5 | 17.8 | 12.1 | 20.4 |
Horse Right Ear | 19.6 | 14.4 | 14.9 | 31.8 |
Horse Right Eye | 3.9 | 6.0 | 1.1 | 16.9 |
Horse Right Front Hoof | 0.0 | 1.4 | 0.0 | 2.6 |
Horse Right Front Lower Leg | 2.4 | 5.1 | 3.6 | 20.0 |
Horse Right Front Upper Leg | 16.5 | 7.3 | 9.3 | 29.4 |
Horse Tail | 37.1 | 33.1 | 23.7 | 36.7 |
Horse Torso | 58.2 | 58.7 | 55.3 | 60.7 |
Motorbike Back Wheel | 49.7 | 49.3 | 38.4 | 55.4 |
Motorbike Body | 62.7 | 62.9 | 57.8 | 62.9 |
Motorbike Front Wheel | 57.4 | 59.5 | 50.5 | 62.7 |
Motorbike Handlebar | 0.0 | 0.0 | 0.0 | 0.1 |
Motorbike Headlight | 19.1 | 15.3 | 5.9 | 20.1 |
Motorbike Saddle | 0.0 | 0.0 | 0.0 | 0.0 |
Person Hair | 58.6 | 55.6 | 47.6 | 59.1 |
Person Head | 56.2 | 54.3 | 47.7 | 58.5 |
Person Left Ear | 11.3 | 8.7 | 3.0 | 23.4 |
Person Left Eye | 1.6 | 4.9 | 0.5 | 19.8 |
Person Left Eyebrow | 0.0 | 0.4 | 0.0 | 4.2 |
Person Left Foot | 13.2 | 9.9 | 7.2 | 15.8 |
Person Left Hand | 6.9 | 10.7 | 8.4 | 26.6 |
Person Left Lower Arm | 14.9 | 14.0 | 7.6 | 33.4 |
Person Left Lower Leg | 12.5 | 11.4 | 11.2 | 21.9 |
Person Left Upper Arm | 16.3 | 13.3 | 10.3 | 40.8 |
Person Left Upper Leg | 9.4 | 6.3 | 8.3 | 27.8 |
Part | Baseline | BSANet | GMNet | FLOAT |
---|---|---|---|---|
Person Mouth | 25.3 | 25.4 | 10.5 | 31.9 |
Person Neck | 39.7 | 35.7 | 26.7 | 42.3 |
Person Nose | 30.5 | 27.2 | 14.4 | 36.2 |
Person Right Ear | 9.1 | 9.9 | 1.8 | 23.3 |
Person Right Eye | 5.7 | 3.9 | 5.6 | 19.6 |
Person Right Eyebrow | 0.0 | 0.8 | 0.0 | 3.1 |
Person Right Foot | 6.7 | 6.9 | 7.1 | 16.3 |
Person Right Hand | 22.2 | 16.6 | 16.1 | 29.6 |
Person Right Lower Arm | 15.2 | 14.6 | 12.5 | 34.0 |
Person Right Lower Leg | 12.4 | 11.9 | 5.6 | 22.5 |
Person Right Upper Arm | 16.5 | 16.7 | 16.8 | 40.5 |
Person Right Upper Leg | 19.2 | 19.9 | 15.2 | 28.8 |
Person Torso | 56.7 | 55.5 | 48.1 | 57.7 |
Potted Plant Plant | 38.2 | 35.0 | 29.9 | 38.2 |
Potted Plant Pot | 31.6 | 32.1 | 23.9 | 30.8 |
Sheep Head | 42.5 | 40.7 | 35.8 | 44.2 |
Sheep Left Back Lower Leg | 0.6 | 1.2 | 2.6 | 6.2 |
Sheep Left Back Upper Leg | 0.0 | 0.0 | 0.0 | 5.4 |
Sheep Left Ear | 8.0 | 14.3 | 2.3 | 33.8 |
Sheep Left Eye | 0.0 | 0.0 | 1.4 | 11.5 |
Sheep Left Front Lower Leg | 0.0 | 0.0 | 0.6 | 15.1 |
Sheep Left Front Upper Leg | 0.0 | 0.0 | 0.0 | 11.7 |
Sheep Left Horn | 0.0 | 0.4 | 0.0 | 15.1 |
Sheep Muzzle | 43.9 | 40.1 | 33.4 | 48.1 |
Sheep Neck | 25.4 | 24.6 | 12.6 | 29.0 |
Sheep Right Back Lower Leg | 0.0 | 0.5 | 0.1 | 6.5 |
Sheep Right Back Upper Leg | 0.0 | 1.1 | 1.1 | 8.0 |
Sheep Right Ear | 13.4 | 5.7 | 8.2 | 29.9 |
Sheep Right Eye | 0.8 | 3.9 | 0.1 | 11.1 |
Sheep Right Front Lower Leg | 0.1 | 0.0 | 0.7 | 10.1 |
Sheep Right Front Upper Leg | 0.5 | 0.0 | 0.4 | 10.3 |
Sheep Right Horn | 0.0 | 2.9 | 0.0 | 11.3 |
Sheep Tail | 2.9 | 3.9 | 0.0 | 16.0 |
Sheep Torso | 66.4 | 66.7 | 63.1 | 69.0 |
Sofa | 52.6 | 47.2 | 52.0 | 69.0 |
Train Coach Back Side | 0.0 | 3.5 | 12.8 | 11.5 |
Train Coach Front Side | 0.0 | 0.0 | 0.0 | 2.6 |
Train Coach Left Side | 16.1 | 17.3 | 10.5 | 12.3 |
Train Coach Right Side | 7.8 | 15.2 | 7.5 | 10.7 |
Train Coach Roof | 0.0 | 7.5 | 1.7 | 0.0 |
Train Coach | 10.7 | 13.3 | 13.1 | 11.5 |
Train Head | 3.6 | 6.7 | 4.0 | 3.0 |
Train Head Back Side | 0.0 | 0.0 | 0.0 | 5.2 |
Train Head Front Side | 67.6 | 66.1 | 63.3 | 34.7 |
Train Head Left Side | 32.8 | 33.4 | 37.7 | 22.8 |
Train Head Right Side | 16.2 | 25.4 | 20.6 | 21.6 |
Train Head Roof | 16.5 | 21.9 | 5.6 | 22.1 |
Train Headlight | 15.5 | 16.1 | 3.2 | 18.4 |
TV Monitor Frame | 42.4 | 43.0 | 34.5 | 44.8 |
TV Monitor Screen | 65.3 | 67.8 | 59.4 | 68.7 |