Constructive Interpretability with CoLabel: Corroborative Integration, Complementary Features, and Collaborative Learning

by   Abhijit Suprem, et al.
Georgia Institute of Technology

Machine learning models with explainable predictions are increasingly sought after, especially for real-world, mission-critical applications that require bias detection and risk mitigation. Inherent interpretability, where a model is designed from the ground-up for interpretability, provides intuitive insights and transparent explanations on model prediction and performance. In this paper, we present CoLabel, an approach to build interpretable models with explanations rooted in the ground truth. We demonstrate CoLabel in a vehicle feature extraction application in the context of vehicle make-model recognition (VMMR). CoLabel performs VMMR with a composite of interpretable features such as vehicle color, type, and make, all based on interpretable annotations of the ground truth labels. First, CoLabel performs corroborative integration to join multiple datasets that each have a subset of desired annotations of color, type, and make. Then, CoLabel uses decomposable branches to extract complementary features corresponding to desired annotations. Finally, CoLabel fuses them together for final predictions. During feature fusion, CoLabel harmonizes complementary branches so that VMMR features are compatible with each other and can be projected to the same semantic space for classification. With inherent interpretability, CoLabel achieves superior performance to the state-of-the-art black-box models, with accuracy of 0.98, 0.95, and 0.94 on CompCars, Cars196, and BoxCars116K, respectively. CoLabel provides intuitive explanations due to constructive interpretability, and subsequently achieves high accuracy and usability in mission-critical situations.



page 4

page 8


BIM: Towards Quantitative Evaluation of Interpretability Methods with Ground Truth

Interpretability is rising as an important area of research in machine l...

Model-Agnostic Interpretability of Machine Learning

Understanding why machine learning models behave the way they do empower...

VINE: Visualizing Statistical Interactions in Black Box Models

As machine learning becomes more pervasive, there is an urgent need for ...

Interpretable Mixture of Experts for Structured Data

With the growth of machine learning for structured data, the need for re...

QUACKIE: A NLP Classification Task With Ground Truth Explanations

NLP Interpretability aims to increase trust in model predictions. This m...

Interpretable and Steerable Sequence Learning via Prototypes

One of the major challenges in machine learning nowadays is to provide p...

Interpretable Deepfake Detection via Dynamic Prototypes

Deepfake is one notorious application of deep learning research, leading...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Machine learning models that are interpretable and explainable are increasingly sought after in a wide variety of industry applications (Chen et al., 2021; Zhao et al., 2018; Nguyen et al., 2021; Linardatos et al., 2021). Explainable models augment the black-box of deep networks by providing insights into their feature extraction and prediction (Zeiler and Fergus, 2014). They are particularly useful in real-world situations where accountability, transparency, and provenance of information for mission-critical human decisions are crucial, such as security, monitoring, and medicine (Chen et al., 2021; Zheng et al., 2016). Interpretable indicates model features designed from the get-go to be human-readable. Explainable indicates post-hoc analysis of models to determine feature importance in prediction.

Inherently interpretable models (Rudin, 2019) are designed from the get-go to provide explainable results. This contrasts with post-hoc explainability, where a black box model is analyzed to obtain potential explanations for its decisions. Inherently interpretable models provide more faithful explanations (Rudin, 2019), since these are directly dependent on model design. Such models also avoid pitfalls for post-hoc explainability such as unjustified counterfactual explanations (Laugel et al., 2019). Inherently interpretable models have higher trust under adversarial conditions (Lipton, 2018) since their predictions can be directly tied to training ground truth through model-generated explanations.

Figure 1. CoLabel Architecture and Dataflow: CoLabel takes in multiple vehicle datasets, each with a subset of desired annotations. ❶ Corroborative Integration integrates annotations into a single training set. ❷ Complementary Features extracts interpretable features with complementary branches. ❸ Collaborative Learning fuses complementary features to yield interpretable predictions for vehicle classification.

There are several challenges, however, in building models that are inherently interpretable. There is no one-size-fits-all solution since interpretability is domain-specific (Rudin, 2019). Existing datasets may not have completely interpretable annotations; instead, most datasets only have the ground truth annotations without interpretable subsets. For example, existing vehicle classification datasets label vehicle make and model, but not vehicle color, type, or decals (Boukerche and Ma, 2021). Furthermore, deep networks are biased during training towards strong signals, and may ignore more interpretable weaker signals. For example, person re-id models focus primarily on a person’s shirt, and need guidance to focus on more interpretable accessories such as hats, handbags, or limbs (Chen et al., 2021).

CoLabel.CoLabelCoLabel. In this paper, we present CoLabel: a process for building inherently interpretable models. We use CoLabel to build end-to-end interpretable models that provide explanations rooted in the ground truth. By construction, CoLabel provides predictions along with a composite of interpretable features that comprise the prediction with a combination of Corroborative Integration, Complementary Features, and Collaborative Learning. We call this this approach to achieve interpretable models from design and implementation constructive interpretability.

We demonstrate the inherent interpretability and superior accuracy of CoLabel in vehicle feature extraction, an important challenge in monitoring, tracking, and surveillance applications (Lipton, 2018; Rudin, 2019). Specifically, vehicle features are crucial for re-id, traffic monitoring and management, tracking, and make/model classification. Current state-of-the-art vehicle classification models employ black-box models.

These mission-critical applications require interpretable predictions for aiding human decisions, particularly for borderline cases where explanations aligned with human experience can benefit human decisions much more than algorithmic internal specifics. The goal of constructive interpretability is to design and build inherently interpretable models aligned with human understanding of applications. This is where CoLabel comes in. As an inherently interpretable model with constructive interpretability in mind, CoLabel has state-of-the-art accuracy as well as interpretable predictions.

We show CoLabel’s dataflow with respect to vehicle feature extraction in Figure 1. Our constructive interpretability approach for inherently interpretable models begins from selection of interpretation annotations for vehicle features: color, type, and make. CoLabel uses these annotations to generate interpretable vehicle features. These features are usable in a variety of applications, such as vehicle make and model recognition (VMMR), re-id, tracking, and detection (Boukerche and Ma, 2021). In this work, we focus CoLabel on VMMR.

Dataflow.DataflowDataflow. Given our desired annotations of color, type, an make, as well as datasets that each carry a subset of these annotations, CoLabel’s dataflow involves the following three steps:

Corroborative Integration integrates multiple datasets and corroborates annotations of ’natural’ vehicle features across them. It then builds a robust training set with ground truth as well as interpretable annotations.

❷ The Complementary Features module extracts features corresponding to the interpretable annotations. The goal is to maintain interpretable knowledge when integrated. Each complementary feature is extracted with its own branch in the CoLabel model. These features , , and are crucial for interpretable predictions.

❸ Finally, Collaborative Learning harmonizes complementary features, ensuring features from different branches can be fused more effectively. With harmonization, branches collaborate to exploit correlations between complementary features. During training, CoLabelbackpropagates a harmonization loss on the error between prediction and ground truth , as well as branch-specific losses on the errors between and ground truth annotations . Simultaneously, improves feature fusion by ensuring branches collaborate on overlaps between complementary branches by exploiting correlations between complementary features.

Contributions.ContributionsContributions. We show that CoLabel can achieve excellent accuracy on feature extraction while simultaneously providing interpretable results by construction. CoLabel’s explanations align with human knowledge of vehicles, avoiding potential difficulties of post-hoc explainability (Rudin, 2019; Laugel et al., 2019; Lipton, 2018). The contributions are:

  • CoLabel: Constructive interpretability approach to design and build an inherently interpretable vehicle feature extraction system by integrating diverse interpretable annotations that are aligned with human knowledge of vehicles.

  • Model: Experimental evaluation and demonstration of the superior accuracy and interpretability achieved by CoLabel

  • Loss: A harmonization loss for fusion function to integrate complementary feature branches to achieve high accuracy and faster convergence in CoLabel

2. Related Work

We first cover recent work in inherently interpretable models and post-hoc explanations. We will then cover vehicle feature extraction.

Figure 2. Constructive Interpretability: Models with constructive interpretability provide explanations for their predictions. This is achieved by incorporating human knowledge. Here, an interpretable model explains its prediction is based on vehcle color, type, seats, and vehicle part similarities.

2.1. Interpretability and Explainability

Interpretability.InterpretabilityInterpretability. Interpretable models are designed from the ground up to provide explanations for their features. Intuitively, interpretability is deeply intertwined with human understanding (Laugel et al., 2019). Models that are inherently interpretable directly integrate human understanding into feature generation. Such models are more desired is mission-critical scenarios such as healthcare, monitoring, and safety management(Rudin, 2019; Chen et al., 2019; Obermeyer et al., 2019). The prototype layers in (Li et al., 2018; Chen et al., 2019) provide interpretable predictions: the layers compare test image samples to similar ground truth samples to provide explanations of the model classification. The decomposable approaches in (Saralajew et al., 2019; Chen et al., 2018)

build component-classifiers that are integrated for the overall task, e.g. image and credit classification, respectively. Like these,

CoLabel is an inherently interpretable model design whose explanations are derived directly from training on annotated ground truth.

Figure 3. CoLabel for VMMR: For VMMR, we use color, type, and make labels as our interpretable annotations. ❶ Corroborative Integration combines various VMMR datasets, each with a subset of desired interpretable annotations, with a labeling team to generate a single dataset with all desired annotations (§ 3.1). ❷ Then, Complementary Features uses 3 branches to extract color, type, and make features. Each branch contains a feature extractor backbone. A dense layer converts features to branch-specific predictions. A second dense layer yields harmonization features (§ 3.2). ❸ Finally, Collaborative Learning fuses features for VMMR classification. Simultaneously, a harmonization loss ensures branch features collaborate on feature correlations (§ 3.3). Predictions are combined from Collaborative Learning and Complementary Features to generate interpretable classification that correspond to annotations from Corroborative Integration.

Explainability.ExplainabilityExplainability. Currently, most approaches perform post-hoc explanation in a bottom-up fashion, where an existing model’s black-box is ’opened’ (Hardt et al., 2021). These include examining class activations (Selvaraju et al., 2017; Zhang et al., 2020)

, concept activation vectors

(Linardatos et al., 2021)

, neuron influence aggregation

(Hohman et al., 2019), and deconvolutions (Zeiler and Fergus, 2014). In post-hoc explanation, second model is used to model the original model’s predictions by projecting model features along human-readable dimensions, if possible (Zhang et al., 2020; Chen et al., 2020; Hohman et al., 2019; Das et al., 2020). However, these approaches do not build interpretable models from the ground truth; they merely enhance existing models for explanation. For example, Grad-CAM’s (Selvaraju et al., 2017) outputs are used with human labeling to determine ’where’ and ’what’ a model is looking at (Zhang et al., 2020; Chen et al., 2020). Similarly, the embedding and neuron views in (Hohman et al., 2019; Das et al., 2020) make it easier to visually characterize an class similarity clusters. However, there are risks to explainability when it is disconnected from the  ground truth (Rudin, 2019; Laugel et al., 2019; Lipton, 2018). Such explanations may not be accurate, because if they were, the explainer model would be sufficient for prediction (Laugel et al., 2019). Furthermore, interpretable models are usually as accurate as black-box models, with the added benefit of interpretability, with several examples provided in (Rudin and Radin, 2019). Thus, the challenges of bottom-up approaches are bypassed with constructive interpretability, since it is a top-down approach for interpretability, as in Figure 2. In constructive interpretability, inherently interpretable models are built with features aligned with human knowledge of application domains, forming intuitively understandable and interpretable models as in (Rudin, 2019; Chen et al., 2019; Laugel et al., 2019; Li et al., 2018; Saralajew et al., 2019; Chen et al., 2018).

2.2. Vehicle Feature Extraction

We demonstrate CoLabel’s interpretability with vehicle feature extraction. This encompasses several application areas, from VMMR(Elkerdawy et al., 2018; Ma et al., 2019; Sánchez et al., 2021), re-id (He et al., 2019; Hsu et al., 2019; Liu et al., 2016b; Suprem and Pu, 2020), tracking (Ristani and Tomasi, 2018; Tang et al., 2018; Xu et al., 2018; Suprem et al., 2019), and vehicle detection(Sochor et al., 2018; Zhang et al., 2018). We cover recent research on feature extraction for these application areas.

Hu et al. presented a framework for identifying vehicle parts for extracting more discriminative representations without supervision. CNN and SSD models are used together for logo detection in high resolution images(Yang et al., 2019). Logo-Net (Hoi et al., 2015) uses such a composition to improve logo detection and classification. Wang et. al. (Wang et al., 2017) develop an orientation-invariant approach that uses 20 engineered keypoint regions on vehicles to extract representative features. Liu et. al. propose RAM (Liu et al., 2018), which has sub-models that each focus on a different region of the vehicle’s image, because different regions of vehicles have different relevant features.

Additionally, there have been recent datasets with varied annotations for feature extraction. Boukerche and Ma (Boukerche and Ma, 2021) provide a survey of such datasets for feature extraction. Yang et. al. (Yang et al., 2015) propose a part attributes driven vehicle model recognition and develop the CompCars dataset with VMMR labels. BoxCars116K (Sochor et al., 2018) provides a dataset of vehicles annotations with type, and uses conventional vision modules for vehicle bounding box detection.

Summary.SummarySummary. Since each application area for vehicle features remains sensitive and mission-critical, interpretable features are crucial for deployment. The above approaches have improved on feature extraction. We augment them with CoLabel to demonstrate interpretability. We will further show that such inherently interpretable models offer additional avenues for increasing model accuracy.

3. CoLabel

CoLabel, our approach for interpretable feature extractions (Figure 3). We have given an overview of CoLabel’s dataflow in Figure 1. Here, we describe CoLabel’s components in details. We first describe ❶ Corroborative Integration in § 3.1. Then we present ❷ Complementary Features for interpretable annotations in § 3.2. Finally, we present ❸ Collaborative Learning in § 3.3, where CoLabel fuses complementary features for interpretable predictions. We implement CoLabel for vehicle feature extraction, which has a need for interpretability in a variety of mission-critical applications in traffic management, safety monitoring, and multi-camera tracking. Specifically, we apply CoLabel for interpretable vehicle make and model recognition (VMMR), where the task is to generate features from vehicle images to identify the make and model.

3.1. Corroborative Integration

Figure 4. Corroborative Integration: Given a desired annotation (e.g. color), a subset of VMMR datasets contains this annotation. Corroborative Integration employs a team of labeling models , each trained on a corresponding . Each model employs JPEG compression ensemble to improve labeling agreement. Using Corroborative Integration, CoLabel can label the remaining datasets with the color annotations.

Constructive interpretability starts from a judicious decomposition of the application domain into interpretable annotations. For vehicle classification with CoLabel, we identified three interpretable annotations for model training in addition to VMMR labels: vehicle color, type, and make. Vehicle color is the overall color scheme of a vehicle. Type is the body type of the vehicle, such as SUV, sedan, or pickup truck. Finally, make is the vehicle brand, such as Toyota, Mazda, or Jeep. We select these annotations as they are broadly common across vehicle feature extraction datasets (Boukerche and Ma, 2021).

One advantage of CoLabel is a very inclusive classifier training process. Of the several VMMR datasets (discussed in § 4.1), each has a subset of the three desired interpretable annotations. CompCars (Yang et al., 2015) labels make, model and type. VehicleColors (Panetta et al., 2021) labels only the colors. BoxCars116K (Sochor et al., 2018) labels make and model. CoLabel is capable of integrating the partial knowledge in all of labeled data sets with Corroborative Integration.

Labeling Overview.Labeling OverviewLabeling Overview. So, CoLabel uses Corroborative Integration to ‘complete’ partial annotations: in this case by adding color labels to each image in CompCars. Let there be interpretable annotations, and a set of datasets, all of which contain some subset of annotations/labels . Datasets in also contain the overall ground truth label , i.e. VMMR. For CoLabel, : color (), type () and make (). We show Corroborative Integration for a single interpretable annotation () in Figure 4.

So, a subset contains desired annotation . We train a set of models , one for each of the datasets in , with the datasets’ corresponding as the ground truth. Then, we build an team of to label the remaining datasets in without color annotation, e.g. the -unlabeled subset , with weighted voting. Since has a subset of desired annotations, we call it a partially unlabeled dataset; this subset is the complement of labeled subset .

Labeling Team.Labeling TeamLabeling Team. Team member votes are dynamically weighted for each partially unlabeled dataset . First, we partition the unlabeled dataset into clusters with KMeans clustering. We also partition each team member’s training dataset into clusters. Then, for each cluster in the unlabeled dataset, we find the nearest training dataset using cluster overlap as the guiding metric. The corresponding team member is used for labeling. We compute overlap between unlabeled dataset clusters and team member training data using the O-metric (Cardillo and L. Warren, 2016) for point-proximity calculation.

O-metric calculates overlap between 2 n-dimensional clusters using some distance metric; in our case, we use cosine distance. O-metric works as follows: given an unlabeled cluster and a labeled cluster from any team member training data , we compute the fraction of points in that are closer to points in . So, for each , we calcualte the point proximity value with:

Here, calculates the distance between to the nearest point in . Conversely, computes the distance between to the nearest point in . Then, we compute the proximity ratio to determine overlap of in :

The value of is bounded in . As gets closer to 1, this indicates most points in are closer to some point in than in its own cluster . We can compute the pairwise cluster overlap between every unlabeled cluster to every labeled cluster. For each unlabeled cluster, we only need to take the labeled cluster with the highest overlap. This is similar to the dataset distance approaches in (Suprem et al., 2020; Suprem and Pu, 2022a; Cardillo and L. Warren, 2016).

Each trained model generalizes to the task for cross-domain datasets, as we will show in § 4.2. However, models still encounter edge cases due to dataset overfitting (Huang et al., 2019). We address cross-domain performance deterioration with early stopping, a JPEG compression ensemble, and greater-than-majority agreement among models.

Early Stopping.Early StoppingEarly Stopping. During training of each model in , we compute validation accuracy over the validation sets in . With early stopping tuned to cross-dataset validation accuracy, we can ensure models do not overfit to their own training dataset.

JPEG Compression.JPEG CompressionJPEG Compression. During labeling of any , each labeling model takes 4 copies of an unlabeled image. The first is the original image. Each of the three remaining copies is the original image compressed using the JPEG protocol, with quality factors of 90, 70, and 50. We use the majority predicted label amongst the four copies as the model’s final prediction. This is similar to the ’vaccination’ step from (Das et al., 2018), where JPEG compression removes high-frequency artifacts that can impact cross-dataset performance.

Using JPEG compression essentially defends against adversarial attack. In this case, the ‘adversarial attack’ is potential label changes due to poor local coverage of the model’s embedding space. Neural networks are typically smooth around their embedding space

(Urner and Ben-David, 2013; Chen et al., 2022). The smoothness is computed with the ratio of the change in embedding with respect to change in input. If this ratio is larger than 1, this means nearby points in the input space are not clustered together in the embedding space. Conversely, a ratio smaller than 1 indicates nearby points are clustered together: a desirable property since nearby points are typically in the same class (Chen et al., 2022; Urner and Ben-David, 2013; Suprem and Pu, 2022b). However, calculation of this ratio is a computationally expensive process, since it requires either clustering the input and embedding space (Chen et al., 2022) or generating perturbations for each input and comparing them to embeddings (Suprem and Pu, 2022b). With JPEG compression, we use a well-known technich to slightly perturb the input image, and check if the labels change. If labels change, this indicates the local region around the input image is not smooth, since the model is changing labels due to minor perturbations. In this case, we can use the majority label as a proxy for directly computing the smoothness.

Team Agreement.Team AgreementTeam Agreement. If an image has no majority label from a model’s JPEG ensemble, we discard that model’s label. Further, if we only have

of models in the labeling team after discarding models without JPEG ensemble majority label, we leave the annotation for that image blank. This is similar to a reject option that directly uses an estimate for smoothness. In our case, computing the JPEG compressions and evaluating on 4 images is significantly faster than the clusterign plus smoothness estimation method in

(Chen et al., 2022; Suprem and Pu, 2022b).

Summary.SummarySummary. Using these steps, we can label most partially unlabeled samples datasets without the desired annotation. We evaluate labeling accuracy given these strategies in § 4.2, where we test on held-out labeled datasets. Our results show average accuracy improvement of almost 20% from 0.83 to 0.98 for held-out unlabeled samples when we use labeling teams with early stopping for team members and JPEG compression ensemble for each model.

3.2. Complementary Features

CoLabel’s next step is Complementary Features extraction, which propagates the three interpretable annotations from Corroborative Integration for feature extraction. Here, CoLabel explicitly learns to extract interpretable features from the interpretable annotations generated by Corroborative Integration. Complementary Features comprises of 2 stages: a shared input stage, followed by complementary feature branches. The shared input stage performs shallow convolutional feature extraction. These features are common to each branch’s input. Then, the complementary branches extract their annotation-specific interpretable features and propagate them to Collaborative Learning.

Shared Input Block.Shared Input BlockShared Input Block. Multi-branch models often use different inputs for each branch (for Vehicle Re-Id, 2019; Zhou and Shao, 2018). CoLabel uses a shared input block to create common input to each branch to improve feature fusing in Collaborative Learning. Feature fusing requires integration of branches that carry different semantics and scales. This is accomplished with additional dense layers after concatenation (Wang et al., 2017)

, longer training to ensure convergence or appropriate selection of loss functions

(for Vehicle Re-Id, 2019). CoLabel’s branches perform complementary feature extraction, where the features are semantically different. So, we use a shared input block to ensure branches have a common starting point in shallow convolutional features and use attribute features from those layers (Chen et al., 2021; Basha et al., 2020). It consists of early layers in a feature extractor backbone such as ResNet, along with attention modules (we use CBAM (Woo et al., 2018)). For CoLabel, we use the first ResNet bottleneck block in the shared input layer, and use the remaining bottleneck blocks in the branches. We show the the impact of the shared input block for training in § 4.3.

Complementary Feature Branches.Complementary Feature BranchesComplementary Feature Branches. Each interpretable annotation from ❶ Corroborative Integration is matched to a corresponding branch in CoLabel. We describe a single Complementary Features branch here. The features from the shared input block are passed through a feature extractor backbone comprising of conv layers, normalization, pooling, and CBAM. This yields intermediate features , e.g. for the color branch. A fully connected layer projects to predictions . A second fully connected layer projects to tentative fused predictions These tentative fused predictions are only used during training to improve feature harmonization in Collaborative Learning. We defer their discussion to § 3.3. Finally, is also sent to Collaborative Learning for fusion with the other branch features and .

Training.TrainingTraining. During training, we compute 2 local losses to train the branch feature extractors. is the branch specific loss for branch , computed as the cross-entropy loss between and :


, the local harmonization loss, is discussed in § 3.3.

Impact of Complementary Features.Impact of Complementary FeaturesImpact of Complementary Features. Corroborative Integration generates interpretable features in . As such, for any prediction, we can decompose CoLabel into the complementary branches and explain predictions with the component annotations. For prediction errors, CoLabel provides provenance of its classification, so that the specific branch that caused the error can be updated. Further, the interpretable branches are extensible: any new desired annotations need only be added to the training set with Corroborative Integration. Subsequently, Complementary Features will deploy a branch to generate interpretable features for the corresponding annotations. We discuss the impact of Complementary Features in accuracy and interpretability in § 4.3

3.3. Collaborative Learning

Figure 5. Collaborative Learning: Features from Complementary Features are fused with concatenation. Branches collaborate on feature correlations and interdependencies with the harmonization losses . uses the fused features as soft-targets for from § 3.2. A fused loss on the cross-entropy loss between VMMR targets and fused prediction also backpropagates over the entire model.

Finally, CoLabel performs feature fusion to generate interpretable predictions. The branches of Complementary Features yield their corresponding features , , and . These are concatenated to yield fused features . A fully connected layer projects to final predictions . CoLabel is trained end-to-end with the cross-entropy loss between predictions and ground truth:


Local Harmonization Loss.Local Harmonization LossLocal Harmonization Loss. CoLabel employs a local harmonization loss for each branch in Complementary Features. Since the branches are extracting complementary features, we need a way for branches to exploit correlation and interdependency between features. We accomplish this with weak supervision on the branch features using the fused feature predictions . Intuitively, we want branches to agree on the overall VMMR task. So, branch features should also accomplish VMMR, in addition to their branch-specific annotation. Using this fused-feature knowledge distillation, we ensure that branch features harmonize on the final VMMR prediction labels . In effect, is a soft target for each branch. The local harmonization loss is computed for each branch as the cross-entropy loss between the tentative fused predictions from Complementary Features and the final fused predictions . For the color branch:


Losses.LossesLosses. CoLabel employs 3 losses during training, shown as red arrows in Figure 3 and Figure 5. The fused loss in Eq.  (missing) backpropagates through the entire model. CoLabel’s branches are trained with branch annotation loss and local harmonization loss :


Here, is the subset of mini-batch that has the annotations for . We need this because during Corroborative Integration, CoLabel

leaves unlabeled samples without team agreement as unlabeled. These unlabeled samples can be processed under an active learning scheme. For this paper, we compute loss using the subset of samples for which the annotation is known.

4. Results

Now, we show the effectiveness and interpretability of CoLabel. First, we will cover the experimental setup. Then we will evaluate each of CoLabel’s components and demonstrate efficacy of design choices. Finally, we demonstrate interpretability as well as high accuracy with the end-to-end model for VMMR.

4.1. Experimental Setup

We cover system setup, as well as datasets.

System Details.System DetailsSystem Details. We implemented CoLabel

in PyTorch 1.4 on Python 3.8. For our backbones, we use ResNet with IBN

(Luo et al., 2019)

, with pretrained ImageNet weights. Experiment are performed on a server with NVIDIA Tesla P100, and an Intel Xeon 2GHz processor.

Dataset Make Model Color Type
CompCars(Yang et al., 2015) Yes Yes No Yes
BoxCars116K(Sochor et al., 2018) Yes Yes No No
Cars196(Krause et al., 2013) Yes Yes No No
VehicleColors(Panetta et al., 2021) No No Yes No
VeRi(Liu et al., 2016a) No No Yes Yes
CrawledVehicles (ours) No No Yes Yes
Table 1. Datasets: We use the boldfaced datasets in our final evaluations. They are partially annotated. To complete the No annotations, we use the underlined datasets for each annotation in Corroborative Integration.

Datasets.DatasetsDatasets. We use the following datasets: CompCars(Yang et al., 2015), BoxCars116K(Sochor et al., 2018), Cars196(Krause et al., 2013), VehicleColors(Panetta et al., 2021), and VeRi(Liu et al., 2016a). We also obtained our own dataset of vehicles labeled with color and type annotations using a web crawler on a variety of car-sale sites, called CrawledVehicles. Datasets are described in Table 1.

We use CompCars, BoxCars116K, and Cars196 for end-to-end evaluations. Their annotations are incomplete, since none contain all three desired annotations. We complete the ground truth for these datasets with Corroborative Integration.

4.2. Corroborative Intergration

VehicleColors VeRi CrawledVehicles
Initial 0.87 0.84 0.86
+Early Stop 0.89 0.9 0.92
+Compression (90) 0.91 0.91 0.92
+Compression(90, 70, 50) 0.94 0.93 0.94
+Labeling Team 0.95 0.95 0.96
+Agreement 0.98 0.97 0.98
Table 2. Color-CM: Accuracy of Color-CM teams in labeling held-out datasets with color annotations. For each column’s evaluation, the team member models are trained with the datasets of the other 2 columns.
CompCars CrawledVehicles VeRi
Initial 0.86 0.88 0.86
+Early Stop 0.89 0.91 0.89
+Compression (90) 0.91 0.91 0.91
+Compression(90, 70, 50) 0.93 0.94 0.94
+Labeling Team 0.96 0.95 0.95
+Agreement 0.98 0.97 0.98
Table 3. Type-CM: Accuracy of Type-CM team in held-out dataset with type annotations.

CompCars, Cars196, and BoxCars116K are missing annotations from our desired interpretable annotation list of make, color, and type (see Table 1). We use Corroborative Integration to augment these datasets. Specifically, we use VehicleColors, CrawledVehicles, and VeRi to label Cars196, BoxCars116K, and CompCars with color annotations. Then, we use CompCars and CrawledVehicles to label Cars196 and BoxCars116K with type annotations.

Figure 6. Labeled Cars196 Images: Using Corroborative Integration, we can assign color and type annotations to Cars196 images. For some images with multiple vehicles or occlusions, we leave blank annotations if Corroborative Integration cannot find agreement on the labels.

Color Model (Color-CM).Color Model (Color-CM)Color Model (Color-CM). Color-CM is a team of 3 models, where each model is trained with VehicleColors, CrawledVehicles, or VeRi, respectively. During training of each model, we use horizontal flipping, random erasing, and cropping augmentations to improve training accuracy. For the JPEG compression ratios of each model, we test with 2 schemes: (a) with a single additional ratio of quality factor 90, and (b) 3 additional ratios with quality factors 90, 70, and 50. We use majority voting from the JPEG compression ensemble. Then with majority weighted voting from team member models, we arrive at the final prediction.

We evaluate with held-out test sets from the labeled subset of datasets. Specifically, we conduct three evaluations. For each of the 3 labeled datasets VehicleColors, CrawledVehicles, and VeRi, we use one for testing and the remaining 2 for building the team. Results are provided in Table 2.

On each held-out dataset, initial accuracy is 0.86. With cross-validation early stopping, we can increase this to 0.91 With JPEG compression with 3 ratios, we can increase accuracy by a further 3%. By teaming several models, we further increase accuracy to 0.95 Finally, we add the agreement constraint, where we accept a label only If of models have agreed on the label. This improves labeling accuracy by an additional 3%, to 0.98. On the held-out test set, we can thus label 90% of the samples, with the other 10% remaining unlabeled due to disagreements.

With these strategies, we label color for BoxCars116K and Cars196 using Corroborative Integration. CoLabel can label 88% of the images in these datasets. Figure 6 shows a sample of these images and their assigned labels.

Type Corroborative Model (Type-CM).Type Corroborative Model (Type-CM)Type Corroborative Model (Type-CM). The team for type annotation labeling is trained with ground truth in CompCars, CrawledVehicles, and VeRi to label BoxCars116K and Cars196. We use a team of 3 Type-CM models. The training process is similar to Color-CM.

Table 3 shows held-out accuracy on test sets. The initial accuracy is 0.87, and with early stopping and JPEG compression, we can increase accuracy by 4%, to 0.94. Team of models increases accuracy to 0.96. By adding JPEG compression agreement to team members, we arrive at a final accuracy of 0.98. We can label 93% of held-out samples with this process. On our desired ground truth of BoxCars116K and Cars196, the team of Type-CM models corroboratively labels 91% of samples.

CompCars BoxCars116K Cars196
Initial 0.73 0.69 0.65
+Bootstrap 0.76 0.71 0.68
+Early Stop 0.77 0.73 0.70
+Compression (90) 0.79 0.74 0.72
+Compression(90, 70, 50) 0.79 0.74 0.73
+Dynamic Weights 0.79 0.74 0.73
+Agreement Threshold 0.82 0.76 0.76
Table 4. Make-CM: Accuracy (in mAP) of make classification. Unlike color and type, makes are different across each vehicle. This effectively converts the problem to one of clustering, similar to vehicle re-id. So, we evaluate directly using the feature output and applying conventional re-id evaluation measures.

Make Corroroborative Model (Make-CM).Make Corroroborative Model (Make-CM)Make Corroroborative Model (Make-CM). The make annotation labeling team is trained with ground truth in CompCars, BoxCars116K, and Cars196. We use a team of 3 Make-CM models, with training process similar to Color-CM and Type-CM. The evaluation differs, however. Since makes are distinct across datasets, we evaluate with the mAP metric common in vehicle and person re-id (Zheng et al., 2016). Specifically, the Make-CM members learn to cluster similar makes together. As in re-id, training with batched triplet mining forces backbones to cluster similar identities together and dissimilar identities further away. In case of Make-CM, identities are the vehicle makes. To evaluate, we use mAP metric to check clustering and retrieval accuracy.

Table 4

shows held-out accuracy on test sets. The initial accuracy has more variance, due to difficulty of re-id clustering and feature learning compared to classification. On CompCars, we can achieve mAP 0.73, and with early stopping and JPEG compression, we can increase accuracy by  10%, to 0.82. Remaining augmentations only negligible improve mAP across the board. We can label 80% of held-out samples by including agreement; this performance is similar to accuracy in

(Bai et al., 2018; Zheng et al., 2019).

4.3. Complementary Features

Figure 7. Convergence of CoLabel: We compare convergence with respect to loss minimization and accuracy of CoLabel against CoLabel-MultiInput and CoLabel-FusionOnly on CompCars. CoLabel-MultiInput has multiple inputs, which slows learning of fuse-able branch features. CoLabel-FusionOnly uses only the fused feature loss without local harmonization, which reduces effectiveness of feature fusion.
Figure 8. Attention masks: We obtain attention masks from the attention modules in Corroborative Integration. We show a few of the attention masks in shallow layers of each branch that help the branch extract their annotation-specific features.

After Corroborative Integration, we have our desired interpretable annotations in the BoxCars116K, CompCars, and Cars196 datasets. Here, we demonstrate interpretable classification with Complementary Features.

Shared Input Block.Shared Input BlockShared Input Block. We first cover the shared input block. The shared block is important for feature harmonization to ensure complementary features are extracted from a common set of convolutional features to improve semantic agreement. Further, the shared block is updated with backpropagated loss from all branches, ensuring the shallow features it extracts are usable by all branches. In turn, this improves convergence and training time for CoLabel. We show in Figure 7 the impact of the shared input block in convergence by comparing loss over time between CoLabel and CoLabel-MultiInput, a model with an input for each branch. The shared input block improves weight convergence and training time; we will discuss CoLabel-FusionOnly in the figure in § 4.4

CompCars Cars196 BoxCars116K
CoLabel NoAtt CoLabel NoAtt CoLabel NoAtt
Classification 0.96 0.89 0.94 0.91 0.89 0.84
Color (0.98) (0.98) (0.98) (0.97) (0.98) (0.98)
Type 0.96 0.92 - - - -
Make 0.95 0.91 0.92 0.87 0.87 0.81
Table 5. Impact of attention: CoLabel with attention outperforms model without attention (NoAtt) across all classification tasks. For VMMR, attention yields between 4-5% improvement across all datasets. Color accuracies are in parenthesis because these are evaluated on labels generated by Color-CM models, instead of oracle ground truth labels.

Attention Modules.Attention ModulesAttention Modules. Complementary Features also uses attention modules in both the shared input block and the complementary feature branches. Attention improves feature extraction in both cases. For the shared input layer, attention masks identify image region containing relevant features for each branch. For complementary features, attention further improves feature extraction accuracy, shown in  Table 5. The make branch improves classification accuracy from 0.91 without attention (NoAtt) to 0.95 by including attention in the branch.

Attention Masks.Attention MasksAttention Masks. We show attention masks of each branch in Figure 8. While attention masks are generally black-boxes, CoLabel’s interpretability allows us to make educated guesses about the masks. Masks for each branch are visually similar to each other, indicating attention has been clustered by Corroborative Integration. Further, the make branch masks indicate the branch focuses on the logo area of vehicles. Similarly, the type branch masks focus on the general shape of the vehicle at the edges. The color branch masks extract overall vehicle color information.

4.4. Collaborative Learning

CompCars Cars196 BoxCars116K
CoLabel 0.96 0.94 0.89
FusionOnly 0.87 0.84 0.81
Table 6. Impact of Harmonization Loss: Across all datasets, using both local harmonization loss and final fused loss significantly improves accuracy.

Finally, CoLabel fuses complementary features to generate final features for VMMR classification. We evaluate CoLabel end-to-end to demonstrate the feasibility of inherently interpretable models with several experiments.

Impact of Loss Functions.Impact of Loss FunctionsImpact of Loss Functions. First, we show the impact of loss functions on training convergence and accuracy on the CompCars dataset. Figure 7 compares CoLabel to a CoLabel-FusionOnly, which uses only the final fused loss and without the local harmonization losses. We also show accuracy across datasets in Table 6. By adding the local harmonization loss to improve feature fusion, we can increase accuracy by almost 10% on average; on CompCars, we increase accuracy from 0.87 to 0.96 for VMMR. Without the harmonization loss, CoLabel-FusionOnly converges slower and has lower accuracy.

CompCars BoxCars116K Cars196
R50-Att (Luo et al., 2019) 0.90 0.75 0.89
R152 (Ma et al., 2019) 0.95 0.87 0.92
D161-CMP (Ma et al., 2019) 0.97 - 0.92
R50-CL (Elkerdawy et al., 2018) 0.95 0.86 -
CoLabel 0.96 0.89 0.94
CoLabel-Match 0.97 0.93 0.96
Table 7. CoLabel vs Non-Interpretable Models: We show performance of CoLabel against several states-of-the-art. CoLabel achieves similar performance with the added benefit of interpretability. Further, interpretability allows us to exploit disagreements between classifications and existing knowledgebases to further improve accuracy. We show this with CoLabel-Match, a model that self-diagnoses mistakes using existing knowledge about vehicle makes, models, and types.

Interpretability and Accuracy.Interpretability and AccuracyInterpretability and Accuracy. Now, we evaluate CoLabel against several non-interpretable models: (a) R50-Att, a ResNet50 backbone with a single branch with IBN and attention (Luo et al., 2019), (b) R152, a ResNet152 with benchmark results from (Ma et al., 2019), (c) D161-CMP, a DenseNet with channel pooling from (Ma et al., 2019), and (d) R50-CL, a ResNet50 with unsupervised co-occurrence learning (Elkerdawy et al., 2018).

For CoLabel, we use a ResNet34 backbone. The first bottleneck block resides in the shared input block. The remaining three bottleneck blocks are copied to each branch, as described in § 3.2

. For each model, we use image size 224×224, and train for 50 epochs with lr=1e-4, with a batch size of 64.

Results are shown in Table 7. CoLabel achieves slightly higher accuracy than both R152 and R50-Att, with accuracy of 0.96 on VMMR on CompCars. For Cars196 and BoxCars116K, CoLabel achieves accuracy of 0.94 and 0.89, respectively. In each case, CoLabel achieves similar or slightly better performance than existing non-interpretable approaches.

However, CoLabel’s results are also interpretable, allowing us to further increase accuracy by retroactive corrections. Given vehicle models and their ground truth types from existing vehicle databases (nht, 2022),  we can check where CoLabel’s type detection and vehicle model predictions do not agree. This occurs when the image is difficult to process, either due to occlusion, blurriness, or other artifacts (an example such disagreement in Corroborative Integration with 2 cars in the same image is shown in Figure 6). As such, CoLabel generates conflicting interpretations, which are themselves useful in analyzing the model. Using this variation called CoLabel-Match, we can further increase accuracy solely due to interpretability, to 0.97, 0.96, and 0.93 on CompCars, Cars196, and BoxCars116K, respectively.

CompCars BoxCars116K Cars196 Params
HML (Buzzelli and Segantin, 2021) 0.65 - - -
CoLabel-SMBL 0.91 0.84 0.89 25M
CoLabel 0.96 0.89 0.94 60M
Table 8. Single-Branch Multi-Labeling: With multi-labeling output from a single branch in CoLabel-SMBL, we can maintain interpretability for a single-branch while sacrificing accuracy.
CompCars BoxCars116K Cars196
D161-SMP 0.97 - 0.92
CoLabel (AVA) 0.96 0.89 0.94
CoLabel-Match 0.97 0.93 0.96
CoLabel-2SC 0.97 0.93 0.95
CoLabel-2SC-Match 0.98 0.94 0.96
Table 9. All-v-All vs 2-Stage Cascade: We compare CoLabel under all-v-all and 2-stage cascade. With CoLabel-2SC, we use a specialized submodel for each make, simplifying the VMMR problem. We can benefit from interpretability by including retroactive correction to further increase VMMR classification accuracy.

Single-Branch Multi-Labeling.Single-Branch Multi-LabelingSingle-Branch Multi-Labeling. Since CoLabel uses multiple branches, a natural question is: could branches be removed while maintaining interpretability? We compare CoLabel’s multi-branch interpretability with Single-Branch Multi-Labeling approach (CoLabel-SMBL). In CoLabel-SMBL, we use a single branch for feature extraction. The features are then used in 4 parallel dense layers: color, type, make, and VMMR detection. With CoLabel-SMBL, we could reduce model parameters, since we use a single branch. We compare CoLabel-SMBL against CoLabel and HML (Buzzelli and Segantin, 2021) in Table 8. CoLabel-SMBL sacrifices accuracy with the reduced parameters. Further, we also found CoLabel-SMBL more difficult to converge, as it needed fine-tuning of learning rates to contend with the multiple backpropagated losses. We leave further exploration of CoLabel-SMBL with other architecture choices to future work.

All-v-All vs 2-Stage Cascade.All-v-All vs 2-Stage CascadeAll-v-All vs 2-Stage Cascade. Here, we evaluate CoLabel as a 2-stage cascade (CoLabel-2SC) and compare to all-v-all (AVA) in Table 9. AVA is the method we have described in CoLabel, where final features are used for make and model classification. In essence, this is a complex problem where CoLabel’s fused features are trained with every vehicle model in our datasets. In CoLabel-2SC, we simplify the problem by creating classifier submodel (i.e. a dense layer) for each make. The submodels use CoLabel’ fused features for prediction. So, given the 78 makes in CompCars, we create 78 submodels. For each image, CoLabel-2SC’s vehicle make prediction activates the corresponding classification submodel.

Since CoLabel-2SC works on a simpler problem, we can improve accuracy from CoLabel, with a trade-off of increase parameters due to the submodels. CoLabel-2SC achieves accuracy of 0.97 on CompCars, 0.95 on Cars196, and 0.93 on BoxCars116K, comparable to D161-CMP (Ma et al., 2019). With CoLabel-2SC-Match, we apply retroactive correction to further improve accuracy on CompCars to 0.98 by verifying predictions with make-model-type knowledgebase (nht, 2022).

4.5. Limitations and Future Work

While CoLabel achieves impressive performance, we have made some assumptions in its design. For example, CoLabel is designed for single-object-per-image. This is addressed with an object detector such as YOLO or Mask-RCNN, and in case of video streams, a detector UDF over a query engine, similar to (Jiang et al., 2018a). We will conduct a comprehensive ablation study to evaluate the impact of interpretable feature branches as well as the data domains overlap, using recent work on studying high-dimensional dataset overlap in (Suprem and Pu, 2022a, b; Jiang et al., 2018b; Suprem et al., 2020) as a starting point.

5. Conclusion

In this paper, we have presented consructive interpretability with CoLabel, an inherently interpretable model for feature extraction and classification. ❶ Corroborative Integration allows us to complete interpretable annotations in datasets using a variety of corroborative datasets. ❷ Complementary Features perform feature extraction corresponding to interpretable annotations, allowing model predictions to be inherently interpretable. Finally, ❸ Collaborative Learning lets CoLabel fuse features effectively during training using local harmonization losses for each branch. Our evaluations show that CoLabel’s components make an interpretable model, with comparable accuracy to state-of-the-art black box models. We are also able to exploit interpretability to self-diagnose mistakes in classification, further increasing accuracy with CoLabel-Match and CoLabel-2SC-Match.


  • (1)
  • nht (2022) 2022. National Highway Traffic Safety Admin.
  • Bai et al. (2018) Yan Bai, Yihang Lou, Feng Gao, Shiqi Wang, Yuwei Wu, and Ling-Yu Duan. 2018. Group-sensitive triplet embedding for vehicle reidentification. IEEE Transactions on Multimedia 20, 9 (2018), 2385–2399.
  • Basha et al. (2020) S.H. Shabbeer Basha, Shiv Ram Dubey, Viswanath Pulabaigari, and Snehasis Mukherjee. 2020.

    Impact of fully connected layers on performance of convolutional neural networks for image classification.

    Neurocomputing 378 (2020), 112–119.
  • Boukerche and Ma (2021) Azzedine Boukerche and Xiren Ma. 2021.

    Vision-based Autonomous Vehicle Recognition: A New Challenge for Deep Learning-based Systems.

    ACM Computing Surveys (CSUR) 54, 4 (2021), 1–37.
  • Buzzelli and Segantin (2021) Marco Buzzelli and Luca Segantin. 2021. Revisiting the CompCars Dataset for Hierarchical Car Classification: New Annotations, Experiments, and Results. Sensors 21, 2 (2021), 596.
  • Cardillo and L. Warren (2016) Marcel Cardillo and Dan L. Warren. 2016. Analysing patterns of spatial and niche overlap among species at multiple resolutions. Global Ecology and Biogeography 25, 8 (2016), 951–963.
  • Chen et al. (2019) Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and Jonathan K Su. 2019. This looks like that: deep learning for interpretable image recognition. NeurIPS 32 (2019).
  • Chen et al. (2018) Chaofan Chen, Kangcheng Lin, Cynthia Rudin, Yaron Shaposhnik, Sijia Wang, and Tong Wang. 2018. An Interpretable Model with Globally Consistent Explanations for Credit Risk. CoRR (2018).
  • Chen et al. (2020) Lei Chen, Jianhui Chen, Hossein Hajimirsadeghi, and Greg Mori. 2020. Adapting Grad-CAM for embedding networks. In WACV. 2794–2803.
  • Chen et al. (2022) Mayee F Chen, Daniel Y Fu, Dyah Adila, Michael Zhang, Frederic Sala, Kayvon Fatahalian, and Christopher Ré. 2022. Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision. arXiv preprint arXiv:2203.13270 (2022).
  • Chen et al. (2021) Xiaodong Chen, Xinchen Liu, Wu Liu, Xiao-Ping Zhang, Yongdong Zhang, and Tao Mei. 2021. Explainable Person Re-Identification With Attribute-Guided Metric Distillation. In ICCV. 11813–11822.
  • Das et al. (2020) Nilaksh Das, Haekyu Park, Zijie J Wang, Fred Hohman, Robert Firstman, Emily Rogers, and Duen Horng Polo Chau. 2020. Bluff: Interactively deciphering adversarial attacks on deep neural networks. In IEEE VIS. IEEE, 271–275.
  • Das et al. (2018) Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Siwei Li, Li Chen, Michael E Kounavis, and Duen Horng Chau. 2018. Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. In SigKDD.
  • Elkerdawy et al. (2018) Sara Elkerdawy, Nilanjan Ray, and Hong Zhang. 2018. Fine-grained vehicle classification with unsupervised parts co-occurrence learning. In ECCV Workshops.
  • for Vehicle Re-Id (2019) Multi-Task Mutual Learning for Vehicle Re-Id. 2019. Kanaci, Aytac and Li, Minxian and Gong, Shaogang and Rajamanoharan, Georgia. In CVPR Workshops.
  • Hardt et al. (2021) Michaela Hardt, Xiaoguang Chen, Xiaoyi Cheng, Michele Donini, Jason Gelman, Satish Gollaprolu, John He, Pedro Larroy, Xinyu Liu, Nick McCarthy, et al. 2021. Amazon sagemaker clarify: Machine learning bias detection and explainability in the cloud. CoRR (2021).
  • He et al. (2019) Bing He, Jia Li, Yifan Zhao, and Yonghong Tian. 2019. Part-regularized near-duplicate vehicle re-identification. In CVPR. 3997–4005.
  • Hohman et al. (2019) Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. 2019. Summit: Scaling deep learning interpretability by visualizing activation and attribution summarizations. IEEE TVCG 26, 1 (2019), 1096–1106.
  • Hoi et al. (2015) Steven CH Hoi, Xiongwei Wu, Hantang Liu, Yue Wu, Huiqiong Wang, Hui Xue, and Qiang Wu. 2015. Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. CoRR (2015).
  • Hsu et al. (2019) Hung-Min Hsu, Tsung-Wei Huang, Gaoang Wang, Jiarui Cai, Zhichao Lei, and Jenq-Neng Hwang. 2019.

    Multi-Camera Tracking of Vehicles based on Deep Features Re-ID and Trajectory-Based Camera Link Models. In

  • Huang et al. (2019) Yangru Huang, Peixi Peng, Yi Jin, Junliang Xing, Congyan Lang, and Songhe Feng. 2019.

    Domain adaptive attention model for unsupervised cross-domain person re-identification.

    CoRR (2019).
  • Jiang et al. (2018b) Heinrich Jiang, Been Kim, Melody Guan, and Maya Gupta. 2018b. To trust or not to trust a classifier. Advances in neural information processing systems 31 (2018).
  • Jiang et al. (2018a) Junchen Jiang, Ganesh Ananthanarayanan, Peter Bodik, Siddhartha Sen, and Ion Stoica. 2018a. Chameleon: scalable adaptation of video analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 253–266.
  • Krause et al. (2013) Jonathan Krause, Michael Stark, Jia deng, and Li Fei-Fei. 2013. 3D Object Representations for Fine-Grained Categorization. In 3dRR-13.
  • Laugel et al. (2019) Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki. 2019. The Dangers of Post-Hoc Interpretability: Unjustified Counterfactual Explanations. In IJCAI. 2801–2807.
  • Li et al. (2018) Oscar Li, Hao Liu, Chaofan Chen, and Cynthia Rudin. 2018. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In AAAI, Vol. 32.
  • Linardatos et al. (2021) Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. 2021. Explainable ai: A review of machine learning interpretability methods. Entropy 23, 1 (2021), 18.
  • Lipton (2018) Zachary C Lipton. 2018. The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16, 3 (2018), 31–57.
  • Liu et al. (2016a) Xinchen Liu, Wu Liu, Huadong Ma, and Huiyuan Fu. 2016a. Large-scale vehicle re-identification in urban surveillance videos. In IEEE ICME. 1–6.
  • Liu et al. (2016b) Xinchen Liu, Wu Liu, Tao Mei, and Huadong Ma. 2016b. A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In ECCV.
  • Liu et al. (2018) Xiaobin Liu, Shiliang Zhang, Qingming Huang, and Wen Gao. 2018. RAM: A Region-Aware Deep Model for Vehicle Re-Identification. In IEEE ICME. 1–6.
  • Luo et al. (2019) Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. 2019. Bag of tricks and a strong baseline for deep person re-identification. In CVPR Workshops.
  • Ma et al. (2019) Zhanyu Ma, Dongliang Chang, Jiyang Xie, Yifeng Ding, Shaoguo Wen, Xiaoxu Li, Zhongwei Si, and Jun Guo. 2019.

    Fine-Grained Vehicle Classification With Channel Max Pooling Modified CNNs.

    IEEE Transactions on Vehicular Technology 68, 4 (2019), 3224–3233.
  • Nguyen et al. (2021) My-Linh Nguyen, Thao Phung, Duong-Hai Ly, and Hong-Linh Truong. 2021. Holistic Explainability Requirements for End-to-End Machine Learning in IoT Cloud Systems. In 2021 IEEE REW Workshop. 188–194.
  • Obermeyer et al. (2019) Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 6464 (2019), 447–453.
  • Panetta et al. (2021) Karen Panetta, Landry Kezebou, Victor Oludare, James Intriligator, and Sos Agaian. 2021. Artificial Intelligence for Text-Based Vehicle Search, Recognition, and Continuous Localization in Traffic Videos. AI 2, 4 (2021), 684–704.
  • Ristani and Tomasi (2018) Ergys Ristani and Carlo Tomasi. 2018. Features for multi-target multi-camera tracking and re-identification. In CVPR. 6036–6046.
  • Rudin (2019) Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.
  • Rudin and Radin (2019) Cynthia Rudin and Joanna Radin. 2019. Why Are We Using Black Box Models in AI When We Don’t Need To? A Lesson From an Explainable AI Competition.

    Harvard Data Science Review

    1, 2 (22 11 2019).
  • Saralajew et al. (2019) Sascha Saralajew, Lars Holdijk, Maike Rees, Ebubekir Asan, and Thomas Villmann. 2019. Classification-by-Components: Probabilistic Modeling of Reasoning over a Set of Components. In NeurIPS, Vol. 32.
  • Selvaraju et al. (2017) Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV. 618–626.
  • Sochor et al. (2018) Jakub Sochor, Jakub Špaňhel, and Adam Herout. 2018. Boxcars: Improving fine-grained recognition of vehicles using 3-d bounding boxes in traffic surveillance. IEEE transactions on intelligent transportation systems 20, 1 (2018), 97–108.
  • Suprem et al. (2020) Abhijit Suprem, Joy Arulraj, Calton Pu, and Joao Ferreira. 2020. ODIN: Automated Drift Detection and Recovery in Video Analytics. Proc. VLDB Endow. 13, 12 (jul 2020), 2453–2465.
  • Suprem et al. (2019) Abhijit Suprem, Rodrigo Alves Lima, Bruno Padilha, João Eduardo Ferreira, and Calton Pu. 2019. Robust, Extensible, and Fast: Teamed Classifiers for Vehicle Tracking in Multi-Camera Networks. In IEEE CogMi. 23–32.
  • Suprem and Pu (2020) Abhijit Suprem and Calton Pu. 2020. Looking GLAMORous: Vehicle re-id in heterogeneous cameras networks with global and local attention. CoRR (2020).
  • Suprem and Pu (2022a) Abhijit Suprem and Calton Pu. 2022a. Exploring Generalizability of Fine-Tuned Models for Fake News Detection.
  • Suprem and Pu (2022b) Abhijit Suprem and Calton Pu. 2022b. MiDAS: Multi-integrated Domain Adaptive Supervision for Fake News Detection. CoRR (2022).
  • Sánchez et al. (2021) Héctor Corrales Sánchez, Noelia Hernández Parra, Ignacio Parra Alonso, Eduardo Nebot, and David Fernández-Llorca. 2021. Are We Ready for Accurate and Unbiased Fine-Grained Vehicle Classification in Realistic Environments? IEEE Access 9 (2021), 116338–116355.
  • Tang et al. (2018) Zheng Tang, Gaoang Wang, Hao Xiao, Aotian Zheng, and Jenq-Neng Hwang. 2018. Single-camera and inter-camera vehicle tracking and 3D speed estimation based on fusion of visual and semantic features. In CVPR Workshops. 108–115.
  • Urner and Ben-David (2013) Ruth Urner and Shai Ben-David. 2013. Probabilistic lipschitzness a niceness assumption for deterministic labels. In Learning Faster from Easy Data-Workshop, NIPS, Vol. 2. 1.
  • Wang et al. (2017) Zhongdao Wang, Luming Tang, Xihui Liu, Zhuliang Yao, Shuai Yi, Jing Shao, Junjie Yan, Shengjin Wang, Hongsheng Li, and Xiaogang Wang. 2017. Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification. In ICCV. 379–387.
  • Woo et al. (2018) Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional block attention module. In ECCV. 3–19.
  • Xu et al. (2018) Zhuangdi Xu, Harshit Gupta, and Umakishore Ramachandran. 2018. Sttr: A system for tracking all vehicles all the time at the edge of the network. In DEBS. ACM, 124–135.
  • Yang et al. (2015) Linjie Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2015. A large-scale car dataset for fine-grained categorization and verification. In CVPR. 3973–3981.
  • Yang et al. (2019) Shuo Yang, Junxing Zhang, Chunjuan Bo, Meng Wang, and Lijun Chen. 2019. Fast vehicle logo detection in complex scenes. Optics & Laser Technology (2019).
  • Zeiler and Fergus (2014) Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In ECCV. Springer, 818–833.
  • Zhang et al. (2018) Xinyu Zhang, Hongbo Gao, Chong Xue, Jianhui Zhao, and Yuchao Liu. 2018.

    Real-time vehicle detection and tracking using improved histogram of gradient features and Kalman filters.

    International Journal of Advanced Robotic Systems 15, 1 (2018), 1729881417749949.
  • Zhang et al. (2020) Xinyu Zhang, Rufeng Zhang, Jiewei Cao, Dong Gong, Mingyu You, and Chunhua Shen. 2020. Part-guided attention learning for vehicle instance retrieval. IEEE Transactions on Intelligent Transportation Systems (2020).
  • Zhao et al. (2018) Ya Zhao, Sihui Luo, Yezhou Yang, and Mingli Song. 2018. Deepssh: Deep semantic structured hashing for explainable person re-identification. In 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 1653–1657.
  • Zheng et al. (2016) Liang Zheng, Yi Yang, and Alexander G Hauptmann. 2016. Person re-identification: Past, present and future. CoRR (2016).
  • Zheng et al. (2019) Zhedong Zheng, Tao Ruan, Yunchao Wei, and Yezhou Yang. 2019. VehicleNet: Learning Robust Feature Representation for Vehicle Re-identification.. In CVPR Workshops, Vol. 2. 3.
  • Zhou and Shao (2018) Yi Zhou and Ling Shao. 2018. Aware attentive multi-view inference for vehicle re-identification. In CVPR. 6489–6498.