From social networks to media sharing platforms, digital pictures are spreading all over the Internet at an overgrowing pace. However, a major drawback of this phenomenon is the diffusion of illicit or illegal material online, specially visual content. In order to fight this trend, multimedia forensic researchers have focused on the development of numerous solutions aiming at inferring pieces of information related to the acquisition and editing history of images [Stamm2013, Piva2013, Rocha2011], among others.
A common problem of interest for forensic analysts is camera model identification. This means being capable of detecting which camera model has been used to shoot a given digital photograph based solely on its content. Indeed, this is a first step toward tracking down the author of distributed illicit contents [Kirchner2015]
(e.g., pictures related to acts of violence, images linked to terrorist behavior, sexually exploitative imagery of children, among others). Given the social relevance of this problem, in the last few years, a continuous effort has been put forward to the development of more accurate and efficient camera model identification solutions. These can be broadly split into two categories: (i) model-based methods leveraging the study of characteristic traces left behind by specific operations applied by different camera models on acquired images; and (ii) data-driven methods based on machine-learning techniques that seek to “learn” the patterns of such telltales automatically. Considering the first category, we can cite methods relying on traces left by color filter array (CFA) interpolation[Bayram2005, Cao2010, Zhao2016], on histogram equalization footprints [Chen2007a], on traces left by camera lenses [Choi2006], and on characteristic noise analysis [Thai2014]. Considering the second category, in turn, we can cite the works of Chen2015a, Marra2015, and Tuama2016
, which extract statistical features in the pixel-domain to train supervised machine-learning classifiers specialized at the problem. More recently, relying upon advancements on deep learning techniques, data-driven solutions based on cnn have outperformed prior art[Tuama2016a, Bondi2017, Bondi2017a], and are becoming an area’s staple.
In the open-set classification problem faced in this work, an image is detected as belonging to the known or unknown model sets and, in the first case, the correct model is also estimated.
The drawback of all aforementioned data-driven techniques—or, more precisely, the evaluation setup to validate such techniques—is that they mainly cope with camera model identification in a closed-set setup. This means that a finite set of camera models is considered when designing the solution, and each image is attributed to one of these models. However, oftentimes analysts must work in open-set scenarios. This means that the investigator must also be able to recognize whether an image does not belong to any of the known models of interest [Kirchner2015].
In this vein, we present herein an in-depth study on open-set camera model attribution based on a supervised learning pipeline. Specifically, we focus on methodologies that perform an analysis at patch level rather than on the whole image, as this opens the door to future development of tampering detection and localization methods as shown byBondi2017b. To the best of our knowledge, open-set camera model attribution has only been introduced by Gloe2012a and later on approached by Bayar2018. Bayar2018 focus on an open-set binary detection problem, i.e., detecting whether an image comes from a known or unknown camera model. Conversely, we aim to solve the joint problem of (i) detecting whether the image under analysis comes from a known or from an unknown camera model and (ii) determining the image source model if it comes from the set of known models.
In previous work [Jain2014], a general open-set classifier have been proposed along with cross-class validation, which is a method tailored to open-set scenarios that aims at searching for the parameters of the proposed open-set classifier. In parallel, another previous work [MendesJunior2017], also proposing an open-set classifier, introduce the parameter optimization procedure which is also tailored at searching the parameters of their proposed classifier, which shares the same essence of the cross-class validation. In the latter work, authors have suggested as future work the employment of their parameter optimization method as a general grid-search procedure that could be applied to any open-set classifier. In our work, we follow this direction and we evaluate what we call closed training protocol (the traditional form) and the open training protocol (with the same essence of the cross-class validation [Jain2014] and the parameter optimization [MendesJunior2017]). We further study those alternatives and we formalize and evaluate what we call the networkopen training protocol, specifically tailored for situations in which deep features are employed. As we shall see later on along with the presented results, the equivalence of open and networkopen indicates the open training protocol as the best and cheaper alternative in terms of data required to be employed.
Camera model identification is not to be confused with camera source attribution at instance level (i.e., distinguishing pictures shot with different devices of the same model) based on sensor pattern noise (SPN) analysis [Lukas2006]. Indeed, SPN-based solutions [Costa2012, Costa2014] exploit a strong correlation-test that is known to sometimes fail in case of unknown camera models [Lukas2006]. However, the same test cannot be applied for model identification, as SPN is a device-specific trace.
In light of these considerations, our key contributions are the following:
We study the open-set camera model identification problem analyzing state-of-the-art open-set classification methods.
We evaluate the effectiveness of cnn features, compared to hand-crafted ones, for per-patch classification in open-set setups.
We formalize and evaluate open-set training protocols applied to open-set classification methods during training for proper estimate of parameters for the open-set scenario.
We carry out the first large-scale testing on the open-set camera model identification problem considering independent datasets and several algorithms, also comparing with known solutions in the literature [Bayar2018].
The best evaluated solution for the problem combines a deep feature extraction method and a state-of-the-art open-set classifier trained with an open-set training protocol of intermediate complexity. This solution works on color patches, making it useful for forgery localization techniques [Bondi2017b]. Moreover, it is capable of reaching state-of-the-art accuracy also in the closed-set framework.
The rest of the paper is structured as it follows. Section II formally introduces the camera model identification problem under different points of view. Section III provides all the details about the algorithmic pipeline used in our evaluation. Section IV reports information about the considered experimental setup. Section V presents the performed experiments and achieved results. Finally, Section VI concludes the paper.
Ii Open-set Camera Model Identification Problem
In this section, we introduce the problem of camera model identification, from the closed-set to the open-set one faced into this paper.
Camera model identification refers to the problem of assigning an image, in a blind fashion, to the camera model that was used to shoot it. This means that no watermarks or side information such as header or EXIF data are used, assuming they will not be available during investigation. Depending on the considered constraints, camera model identification can be cast into different kinds of problems, as shown in Figure 1. In the following, we report the main differences between these problem formulations.
Ii-a Closed-set Classification
Closed-set camera model classification is the problem of assigning an image to a camera model within a known set of possible models, as depicted in Figure 1(a). In this scenario, it is required to assume that the investigator is sure that the camera model of the picture under analysis belongs to the set of candidate models.
Formally, let be a color image acquired with the camera model identified by label . Consider further as the set of labels belonging to known camera model dataset, e.g., available to the analyst when developing the solution. The goal in closed-set camera model identification is to estimate the label associated to the picture under analysis.
This is by far the most widely considered scenario in the literature [Kirchner2015]. However, closed-set classification is bound to fail whenever the analyst has no full knowledge on all the possibly used camera models: in real-case open-set scenarios, it happens that , in which , , is the unknown label that represents any unknown class.
Ii-B Open-set Detection
Relaxing the constraint of knowing all possible camera models, we enter the open-set realm. Indeed, in an open-set scenario, the image under analysis can belong to either known or unknown camera models. In particular, we refer to open-set camera model detection as the problem of detecting whether an image belongs to the set of known models, or to the set of unknown ones, as depicted in Figure 1(b).
Formally, the goal of open-set camera model detection is to estimate whether or for a given image . This is basically a two-class classification problem that does not provide the analyst with information on the actual used camera model. To infer the possible used camera model, an open-set detection solution should be paired with a subsequent step of closed-set classification, as proposed by Bayar2018.
Ii-C Open-set Classification
The most complete camera model identification problem formulation is that of open-set classification. As a matter of fact, this refers to the problem of jointly estimating whether the image under analysis comes from a camera in the known set of models or from an unknown model and, if condition one holds, also detecting which model it is, as depicted in Figure 1(c).
Formally, the goal of open-set camera model identification is to estimate for a given image .
Typically, to properly deal with an open-set classification problem, three different kind of data are employed:
Known data (train and test): images shot with models that the analyst must correctly detect and classify.
Known-unknown data (optional; train and test): images shot with models available at training time but assumed as unknown in order to model unknown camera models at algorithm validation time. Those data might or might not be available.
Unknown-unknown data (test only): images shot with models and not used for either training or validation, used to properly evaluate a method’s performance in the wild. Those data only appear for classification once the classifier is trained.
Open-set classification is by far the most complete problem formulation of the overall camera model identification problem. In this paper, we present an algorithmic pipeline to solve this problem, deeply analyzing each building block of the algorithm in all combinations of the alternatives.
Previous works in open-set camera model identification have not fully evaluated the multiclass open-set classification problem. Bayar2018 have considered the performance of the classification methods for detecting known vs unknown and, independently, the closed-set classification performance among the classes. In this latter evaluation, the classifiers work in a closed-set scenario, i.e., they never predicts as unknown. As we shall see in Section V, the results we have obtained in our study are relatively lower in terms of accuracy compared to the ones reported in their work. It happens because our evaluated classifiers always have the ability of predicting test instances as unknown at the same time of choosing which known class they are, in case they are considered as belonging to one of the known ones. It is worth considering that the accuracy in a problem as described in Section II-C tends to be smaller than considering, independently, the detection accuracy and the closed-set accuracy of the methods without the option for rejection, as in the open-set classification problem the classification methods can perform the following types of error: misclassification, false unknown, and false known [MendesJunior2017].
Iii Evaluation Pipeline
In this section, we provide all the details about the factors we evaluate in this work. We first provide an overview of the overall algorithmic pipeline. Then, we focus on each separate block of it, reporting information about all methodologies employed in this paper.
To solve open-set camera model attribution, we study the possibility of exploiting a supervised classification strategy leveraging image descriptors tailored to capture camera-based traces proposed in the closed-set scenario literature. Specifically, we follow the pipeline depicted in Figure 2, which is composed by three main modules: a feature extractor, a training protocol for preparing training data, and an open-set classifier. For each module, we investigate the possibility of using different strategies.
Feature extraction consists in computing a discriminative feature vector from an image . The feature extractor algorithm is tuned to obtain characteristic camera model information while compacting data dimensionality. Feature vectors extracted from pictures sharing the same camera model should be similar. Conversely, feature vectors extracted from images shot with different models should be, ideally, strongly dissimilar.
Open-set classifiers, as we shall see, tend to associate a bounded region of the feature space to the known classes. A recent work [MendesJunior2017] has shown that the split of training data for parameter search can have an influence on the final model obtained by an open-set classifier. The training protocol splits the training data into fitting data and validation data for parameters search, as depicted in Figure 2. This is a delicate step, as a good open-set classifier must “learn” its parameters taking into account the risk of the unknown, not just the empirical risk measured on known data [Scheirer2013]. In essence, prominent alternatives at this stage aim at employing part of the known training data as known-unknown data, as a form of simulation of the unknown.
The role of an open-set classifier is to learn a mapping between feature vectors and camera labels . This mapping is learned at training time by observing several different pairs for many different images and values, . The open-set classifier partitions the space spanned by all possible vectors , associating different regions of the feature space to different labels .
Once the system has been fully trained, it can be deployed. Whenever a new image under investigation is considered, a feature vector is extracted. The open-set classifier model is employed to predict the vector with one class label .
Iii-B Feature Extractors
Different feature extractors for camera-related features have been proposed in the literature. We decided to focus on recently proposed ones that have shown good performance in closed-set camera model attribution setups.
Iii-B1 Rich features
Fridrich2012 have proposed the use of statistical descriptors known as rich features for steganalysis. Rich features are obtained by pre-processing an image through high-pass filtering, quantization and truncation. The rich feature vector is then computed by counting the occurrences of different pixel group combinations. The use of rich features has subsequently proved successful for other forensic applications, from tampering detection [Cozzolino2014] to camera model attribution [Marra2015]. We denote as the rich feature vector referred to as SPAM by Marra2015 for camera model identification. It has already proved to be more discriminative than those proposed by Gloe2012a, Xu2012, and Celiktutan2008 as shown by Marra2015.
Iii-B2 CFA features
As shown by Chen2015a, the concept of rich features can be extended to work across different image color planes. Chen2015a have shown that it is possible to capture characteristics related to color filter arrays (CFA) for camera model identification. For this reason, we denote as the CFA-based feature vector proposed by Chen2015a. As shown by Bondi2017, this can be considered a baseline solution especially when large images are concerned.
Iii-B3 cnn-derived features
We adopt as a data-driven method the cnn proposed by Bondi2017 with an architecture comprising four convolutional layers followed by two inner product layers. It has been successfully applied to attribute images to different camera models using patches as input. In principle, the output of each cnn layer can be employed as a feature vector . We employ three layers in this work: , obtained after the last convolutional layer; , obtained after the first inner product layer; and , obtained after the second inner product layer, where is the cardinality of the set of known cameras ( in our experiments, as in the work of Bondi2017).
Iii-C Training Protocols
To train open-set classifiers, a set of hyper-parameters must be tuned through some method of parameter search to maximize classification accuracy and generalization/specialization capabilities of the employed method. A typical way to do this consists in splitting training data into fitting and validation data. The selected classifier is then trained on fitting data using different sets of hyper-parameters. Finally, the parameters which model provides the highest accuracy on the set of validation data are selected. The final model is generated on the entire training set with those parameters and results are reported on images belonging to a completely separate (independent) test dataset. In this work, we explore three different training strategies for open-set classifiers. The introduction of this stage in the pipeline was inspired by the work of MendesJunior2017, which pointed out their parameter optimization as a general form of grid search for future investigation. In Figure 3, we depict those alternatives as described below.
Iii-C1 closed strategy
Depicted in Figure 3(a), this is the simplest training strategy, in which no knowledge on the unknown classes is simulated. Indeed, both fitting and validation datasets contain samples from all known classes (i.e., camera models), and no instance from known-unknown data is used in validation. In other words, parameter search is performed simulating a closed-set setup. This means that the classifier will set the boundaries for each class in the feature space taking into account only the empirical risk aiming at optimizing the separability of the known classes.
Iii-C2 open strategy
Depicted in Figure 3(b), in order to let the classifier better tune against unknown samples, an straightforward strategy consists in training the classifier on known data, and tuning it considering both the presence of known and known-unknown samples. When the open strategy is selected, of the classes are employed as known and the other are employed as known-unknown in validation. The classifier fitting procedure is carried out on the known classes, however, validation during parameter search is carried out on all classes, i.e., known and known-unknown camera models. In doing that, parameter search is performed simulating an open-set setup. After the best parameters are obtained, the final model is trained with all known training classes to provide a fair comparison with the closed strategy, i.e., the same number of classes to correctly detect is employed.
Iii-C3 networkopen strategy
Depicted in Figure 3(c), the networkopen strategy employs unknown data—from the point of view of the network used for feature extraction—as known-unknown data for validation. Dealing with data-driven features (i.e., those extracted using a cnn), special attention must be payed to the fact that the cnn, as a feature extractor, must also be trained and validated on the known classes in order to enable discrimination within the set of known camera models.
This strategy considers that the cnn has been separately trained using all available known classes. The validation set employed during the cnn training process comes from the set of known classes—as it also happens with open and closed strategies. For networkopen, to better guide the choice of classifiers’ parameters, additionally to the known classes, the validation set also includes samples from extra known-unknown classes, i.e., classes never employed for cnn training or validation. Parameter search of the classifiers is carried out using all known data along with those extra known-unknown data. Finally, when hyper-parameters have been selected, the final model training is performed using just the known classes, for a paired experiment with the other strategies. In doing that, parameter search is performed simulating an open-set setup also in the point of view of the network.
This approach is appropriate for use with cnn-derived features, however, for the sake of fairness, those extra classes, that are known-unknown from the point of view of the network, are also employed in experiments with and features when networkopen strategy is applied.
Iii-D Open-set Classifiers
In the open-set scenario, a classifier should be able to assign one or more bounded regions in the feature space for each known class. In contrast, closed-set classifiers simply splits unbounded portions of the feature space to each of the known classes. This concept is illustrated in Figure 4.
In this work, we employ for evaluation multiple open-set classifiers available in the literature. svm have been applied in the literature to solve various classification problems, including open-set ones in recent works. Traditional svm can be straightforwardly employed for open-set problems by means the one-vs-all [Rocha2014] multiclass-from-binary approach [MendesJunior2017]: when a feature vector is classified as negative by all binary svm that compose the multi-class classifier, then is rejected as unknown. Alternatively, ocsvm can also be easily used in open-set setups, as it focuses on carving a decision boundary around known classes, thus points related to unknown classes can be rejected. The same all-negative criterion can be employed for any one-class classifier [Heflin2012, Pritsos2013]. Additionally, other methods derived from svm have been proposed in the literature specifically for open-set problems. In this work, we considered the wsvm [Scheirer2014], dbc [Costa2012, Costa2014], ssvm [MendesJunior2018b], and pisvm [Jain2014].
In addition to these svm-based approaches, we also consider the osnn classifier proposed by MendesJunior2017. This is a recently proposed technique that extends upon the classic nearest neighbors approach. The main rationale behind this method is to avoid relying on raw similarity scores for thresholding. Rejection of unknown instances is accomplished through the of ratio of similarity scores instead. Furthermore, we also consider the classifiers employed by Bayar2018, i.e., et [Geurts2006], psvm, softmax, and ncm. Also, by suggestions of previous work [Bayar2018], we employ a 2psvm which consists on having a ocsvm for solving the known vs unknown problem, then, if the test instance is classified as known, a psvm is employed for choosing the class, otherwise the image is classified to an unknown model.
Iv Experimental Setup
In this section, we provide details regarding the employed datasets and evaluation metrics.
To evaluate all tested methodologies thoroughly, it is important to consider a large enough image database. In this work, we merged three different datasets freely available from previous work.
Iv-A1 dresden [gloe2010dresden]
This dataset contains almost images from different camera models. Exactly as in the work of Bondi2017, we selected images from models111We considered Nikon D70 and Nikon D70s on the same single class due to the negligible differences between them, as reported by gloe2010dresden. as the set of images from known camera models. This set was split in training, validation, and test sets [Bondi2017]. The training set was used to train the cnn-based feature extractor and all classifiers. All images from remaining models—not considered in the subset of models previously selected—have been considered as known-unknown along with the networkopen strategy and ignored for both closed and open strategies.
Iv-A2 isa22footnotemark: 2
This dataset contains around images from camera models. All images from models not overlapping with Dresden Image Database have been selected as unknown-unknown models for the test set in the open-set experiments. 33footnotetext: Available at: http://www.recod.ic.unicamp.br/~filipe/dataset [Costa2014].
Iv-A3 flickr\getrefnumberfn:filipe_dataset\getrefnumberfn:filipe_datasetfootnotemark: fn:filipe_dataset
This dataset comprises around images from more than camera models. Differently from previously mentioned datasets, these images have been downloaded from Flickr444Available at: https://www.flickr.com. image hosting service. To avoid dealing with images from the same camera taken at different resolutions, only images at maximum resolution for each model have been selected. All images have been considered as belonging to unknown-unknown camera models in test set for the open-set experiments.
As performed by Bondi2017, we obtain, in a content-aware way, non-overlapping -pixel patches from each image. Provided results are based on majority voting after classification per patch. All patches coming from the same image have been carefully placed only into one of training, validation, and test sets in order to avoid overfitting problems and training/testing contamination.
As evaluation metrics, we employ a set of commonly used ones, as well as others recently proposed for open-set scenario [MendesJunior2017, Bayar2018]. In particular, we consider different definitions of accuracy and f-measure. Concerning accuracy, we employ the following definitions:
Iv-B1 aks [MendesJunior2017]
This is the accuracy in correctly attributing images from known models to the actual models. This metric encompasses two kinds of misclassification errors: known-model images attributed to unknown class (false unknown) and known-model images attributed to wrong known classes (misclassification).
Iv-B2 aus [MendesJunior2017]
This is the accuracy in correctly classifying as unknown the images from unknown camera models.
Iv-B3 na [MendesJunior2017]
This is the average between aks and aus and provides an overall view of a classifier performance in terms of both open- and closed-set scenarios.
Iv-B4 da [Bayar2018]
This averages the percentage of images from known cameras detected as coming from known models, and the percentage of images from unknown cameras detected as coming from unknown models. This metric does not take into account whether images from known cameras are misclassified to the wrong camera model.
Concerning f-measure, an additional comment is in order. Traditionally, f-measure is defined in terms of precision and recall as
Depending on the definitions of precision and recall employed, we obtain different f-measure definitions. MendesJunior2017 has pointed out that it might be inappropriate to consider the unknown classes as any other known class in terms of tp, fp, and fn calculations. Therefore, considering the number of known camera models, and the -th class concerning the unknown classes , we resort to the following f-measure definitions:
Iv-B5 osfmM [MendesJunior2017]
F-measure using precision and recall defined as
Iv-B6 osfmm [MendesJunior2017]
F-measure using precision and recall defined as
Iv-B7 fmM [Sokolova2009]
F-measure using precision and recall defined as
Iv-B8 fmm [Sokolova2009]
F-measure using precision and recall defined as
The main difference between traditional and open-set versions of f-measure is that the latter does not consider the effect of the unknown class in terms of tp as the unknown cannot represent a single positive class. Indeed, the sum index spans the range rather than , thus excluding the label representing the unknown classes. However, both osfmM and osfmm account for false known and false unknown through fp and fn, respectively, in Equations (1) and (2).
We have evaluated all combinations of extracted features (i.e., ), training protocols (i.e., ), and classifiers (i.e., ) for a total amount of cases of study. Results for each metric are reported in a complete and detailed table of all our experiments, as a supplementary material.555See supplementary material available at https://tinyurl.com/ya85fr5h (webpage will be transferred to Github upon acceptance).
Results show that, overall, better performance are obtained for pisvm, et, and ssvm classifiers. Regarding the training protocols, interestingly, open has presented slightly superior results compared to networkopen, despite using less known-unknown data. And, finally, ip1 presents the better result among the features, although ip2, in general, seems to be the most discriminative one.\getrefnumberfn:code\getrefnumberfn:codefootnotemark: fn:code Hereinafter we report a subset of the obtained results in order to highlight the most interesting findings in terms of best feature set, training protocol, and classifier.
V-a Feature Extractors
To identify the feature vector most suitable for open-set camera model identification problem, we analyzed the behavior of all features (i.e., , , , , and ) paired with different training strategies and classifiers. To summarize the achieved results, we rely on na as preferred analysis metric. As a matter of fact, na clearly takes into account the ability of correctly classifying known samples at camera level as well as rejecting the unknown. Therefore, an algorithm with high na value is a good candidate to work for both known and unknown classes.
Table I reports the best na achieved with each feature extractor. Specifically, it shows which combination of classifier and training strategy enables to obtain the achieved na, as well as all the other metric values for the selected classifier. From this table, it is possible to notice that the best results are obtained by cnn-based features. In particular, ip1 achieves the best na, which is close to . This confirms the behavior observed by Bondi2017 for the closed-set scenario: hand-crafted features (i.e., and ) performs better on high resolution images, whereas the cnn is superior when trained on small pixel patches as the ones considered in this work. Their explanation for the affected accuracy with hand-crafted features when working with those small patches is that hand-crafted features relies on co-occurrence [Fridrich2012, Chen2015a], for which those calculations for small patches might make it less stable and reliable.
|Feature||Classifier||Training Protocol||Best na||aks||aus||da||osfmM||osfmm||fmM||fmm|
|Training Protocol||Feature||Classifier||Best na||aks||aus||da||osfmM||osfmm||fmM||fmm|
|Classifier||Feature||Training Protocol||Best na||aks||aus||da||osfmM||osfmm||fmM||fmm|
It is interesting to notice how aks and aus are unbalanced for hand-crafted features. For instance, rich and cfa show aus higher than , but aks lower than . This means that the classifier rejects many more images as unknown than it should. This makes these features not appealing for open-set problems, as the presence of unknown devices greatly hinders the closed-set classification capability of these features. The same behavior is also captured by the metrics based on f-measure. Conversely, ip1 is able to correctly classify unknown images with almost accuracy (aus), and to correctly attribute known-camera images to their model with accuracy (aks).
V-B Training Protocols
To evaluate the different training protocols, we considered na as reference metric for the same reasons previously mentioned. Table II reports the best na results for each protocol, also showing which feature and classifier is used to obtain the reported result. Also, the other metrics are then reported for each case.
It is possible to notice that open strategy presents better results, more than % higher than the best result with networkopen. In Table II, although closed strategy presents better results than networkopen, in general, we have observed that closed tends to perform the worse.\getrefnumberfn:code\getrefnumberfn:codefootnotemark: fn:code Also in a general evaluation, we also observe that, in fact, open tends to perform slightly better than networkopen.
It is worth to highlight one aspect about the closed strategy. Despite this strategy’s name, all classifiers employed along with it are open-set ones. Therefore, even if trained only considering known camera images, they still have the ability to reject new data as unknown (remember, from Section III-C, the different training protocols refers only to the split of the training data). This explains why using the closed strategy is still possible to achieve aus higher than . However, even though, open approaches , almost % of difference from the closed strategy.
Furthermore, considering all the measured combinations (), classifiers training with open obtained better results than versions trained with networkopen in of the them, while networkopen wins in cases. It also indicates an slightly better performance for open protocol. However, when networkopen evinces better results, the classifiers obtain an average of about % better results, while open improves only % in average.
This is a counter intuitive result, as networkopen uses the same known data as open strategy does, along with extra known-unknown data from the other dresden classes not employed as known. The numbers regarding the difference of those two training protocols indicates some similarity among the representativeness of the two sets of training data. Therefore, those results indicate that by simply having some known-unknown data, although they are not unknown from the point of view of the network (open strategy), is enough for improving the performance compared to the traditional closed form. It means those extra data are not necessary, which is a good trace also for making the training process cheaper.
Moreover, those results are good evidences that representation of unknown instances are as distinct as the representations of known-unknown from the point of view of the network. It means those representations are distinct alike from the known instances after a trained network is employed for feature extraction. Those results are also in tune with the ones presented by Bondi2017: they have performed a closed-set experiment with a distinct set of camera models not employed for network training and they have showed that representations for those camera models are distinct enough to allow discrimination among them.
V-C Open-set Classifiers
To analyze the effect of different classifiers, Table III reports the best na result obtained with each classifier, showing also the feature and training strategy used in each case. For each selected methodology, all other metrics are also reported.
From these results—as we saw in other tables as well—it is possible to see that pisvm performs better than its counterparts, achieving na close to , however, best aks and aus are obtained with ssvm and ocsvm, respectively. Results in Table III show many classifiers with reasonable performance: among the cases et have obtained the best performance for the macro-averaging versions of the f-measure measures and osnn presents best results for the micro-averaging versions. ocsvm also outperforms other methods based on da although its high propensity of rejecting instances as unknown. Additionally, closed protocol only appears to be the best one for ssvm and 2psvm classifiers, all other classifiers has the open or networkopen variations as the best training protocol, and open appears in most of the cases.
It is important to notice that 2psvm appears as one of the last methods in the ranking of Table III. This low performance for 2psvm can be justified by its implicit assumption that all known classes can be modeled as a single class. It does not take into account the fact that known classes can be sparse in the feature space and some intermediate regions among those classes can refer to the unknown, i.e., it is difficult to specialize on the known classes by means of a single model. Furthermore, the best na result with 2psvm is obtained with closed training protocol, which indicates that even though simulation of the open-set scenario is performed for parameter optimization, a one-class classifier is not able to handle well the feature space.
In general, we verify that by the straightforward employment of an open-set classifier, as is, improves results for the open-set scenario compared to closed-set classifiers adapted for open-set recognition by means rejection through thresholding of similarity scores. Further details regarding comparison with those state-of-the-art solutions are presented in the next section.
V-D Comparison with State-of-the-art
To the best of our knowledge, the only work presenting results for the open-set camera model identification problem is the work of Bayar2018. In particular, in this work, the authors propose two different approaches. The first one (V-D1) relies on confidence score thresholding: when the classifier is not “sure” about its classification to a certain known class, test instance is then rejected as unknown. The second approach (V-D2) assumes known-unknown data is available for training a classifier for detecting if a test instance is known or unknown. For this approach, previous work have evaluated only the detection ability although in a real open-set scenario further decision should be required to chose the correct class in case an instance is detected as known.
V-D1 Approach 1
The first approach proposed by Bayar2018 works as it follows. A multi-class classifier is trained with closed training protocol.666Previous work [Bayar2018] have not evaluated neither open nor networkopen protocols. To the best of our knowledge, our work evaluates them for the first time in this problem. This classifier is chosen in order to also provide a confidence score about detected class. Instances providing a low confidence score are classified as unknown. For this class of methods, we implemented their solutions based on softmax, ncm, psvm, and et [Geurts2006].
Table IV reports the metric difference achieved by the best solution we have evaluated in previous sections compared to the baselines, by considering, for each method, the setup that maximizes na. From this comparison, in general, it is possible to notice that the best solution we found in our analysis is able to achieve better results than all strategies reported by Bayar2018. For most of the measures, for each of the compared baselines, pisvm improves the accuracy.
We see in the same table that et, as employed by Bayar2018, is the most competitive method compared to a classifier specially designed for open-set scenario (pisvm). Although its high accuracy, we should analyze some theoretical properties of the classifier. For instance, consider the ability of bounding the region of the feature space in which a possible test instance would be classified as belonging to one of the known classes, i.e., bounding the klos [Scheirer2013, MendesJunior2017]. Figures 5(a) and 5(b) depict the decision boundaries of pisvm and et classifiers, respectively, in the feature space formed by the two first features of the ip2 layer. For those images, only training samples from the 4 first classes, out of the 18, were employed to avoid cluttering the visualization. Small circles represent training samples. Colored regions indicate that a possible test instance in there would be classified to the class of the same color. The white region represents rejection as unknown. In Figure 5, we observe that pisvm is able to bound the klos, properly ensuring the rejection of any data point that would appear far from the support of the training samples in the feature space. However, by thresholding the probability score of the et classifier, the same property is not ensured. In general, we see that pisvm demonstrates a more controlled behavior.
V-D2 Approach 2
The second approach proposed by Bayar2018 works as it follows. A binary classifier is trained to distinguish between images from known and unknown camera models. The objective here is to analyze only the detection ability. All samples from all known classes are considered into a single known class called known. Extra data from other classes not of interest are employed on the unknown class for the binary classification. As in the previous experiments, we consider the 18 classes of dresden as the known classes of interest. For the extra known-unknown data, we employed the remaining classes of the dresden dataset, as those classes were also employed along with networkopen training protocol. For this method, we implemented both solutions shown by Bayar2018, i.e., psvm777Notice, however, that Platt’s probability is not required to be employed in this context as only the class decision matters in this case. and et.
In Table V, we present the da for the two baselines as well as for pisvm solution which have presented the best results throughout our experiments. Furthermore, dks and dus are also presented for a more in-depth evaluation of the performance of the classifiers. networkopen were selected for those results because baselines require extra known-unknown data for training although pisvm have obtained best performance along with open strategy. ip2 is employed in this case because, as previously saw\getrefnumberfn:code\getrefnumberfn:codefootnotemark: fn:code, it has comparable or better results than ip1 in general and, furthermore, baselines has presented slightly better results with this feature, compared to ip1.
Our results for this approach, as seen in Table V, are far from the ones reported by Bayar2018 as the baselines have almost no ability to reject instances as unknown. Our conclusions from those results is that by relying solely on known-unknown data for training a classifier to distinguish, in the wild, known versus unknown classes is susceptible to a worst case scenario. We conjecture that the known-unknown data employed for those classifiers makes them create a decision frontier in the feature space in such a way that most of the real unknown data (from isa and flickr datasets) becomes accepted as known. If a different set of known-unknown is employed in place of the unknown part of the dresden dataset, we believe results might drastically differ. Taking an essentially distinct approach, pisvm along with networkopen training protocol does not relies solely on the known-unknown data for defining its boundary decision: instead, it minimizes the risk of the unknown also taking advantage of the inter-class information gathered from the known data [Jain2014].
V-E Post-fusion Analysis
In the machine learning field, it is well known that jointly using a series of different models can help increasing classification performance. This is known as ensemble learning [Sagi2018]. In the light of this, here we present results achieved with a very simple yet effective ensemble fusion technique. We perform majority voting among different models. Given a set of trained models, we test the image under analysis with all of them, and perform majority voting on their output. If the majority of the votes is for rejecting as unknown, the image is then classified as unknown.
By considering all combinations obtained by fusing up to single models achieving na greater than . Top-three results are reported in Table VI. Notice that, the features that are selected are always and . Moreover, top results includes all three training protocols. The classifiers that appear among those selected solutions are pisvm, ssvm, osnn, and et. These results confirm that by using post-fusion it is actually possible to increase na of approximately %, and no more than models are needed. This paves the way to the development of more complex ensemble methods for camera model identification.
|(open, osnn, ip2), (networkopen, pisvm, ip2), (open, ssvm, ip2), (closed, ssvm, ip1), (open, et, ip2), (open, pisvm, ip1)|
|(networkopen, ssvm, ip1), (open, osnn, ip2), (open, ssvm, ip2), (closed, ssvm, ip1), (open, et, ip2), (open, pisvm, ip1)|
|(open, osnn, ip2), (open, ssvm, ip2), (closed, ssvm, ip1), (open, et, ip2), (open, pisvm, ip1)|
V-F Impact of an open-set solution
In Figure 6, we present two confusion matrices. One of them, in Figure 6(a), obtained by an open-set solution and the other, in Figure 6(b), by the closed-set output of the neural network employed along this work. By comparing Figures 6(a) and 6(b), we observe that the ability on recognizing instances of each individual model is affected on the open-set solution. That is expect as long as an open-set solution can also perform the fault of rejecting instances as unknown, i.e., false unknown, while a closed-set solution can only have misclassifications. On the other hand, we clearly see the undesirable behavior of the closed-set solution of assigning every unknown instance to one of the known models, i.e., % on aus or, in another perspective, % of false known. The false known rate obtained by the open-set solution, in this example, is %. Anyhow, it is worth noticing that most of the open-set classifiers can be tuned to decrease its false known accuracy although with the expense of increasing its false unknown accuracy.
In this paper, we studied the use of a supervised-learning strategy for image camera model identification in an open-set scenario. In doing so, we explored the possibility of using multiple camera-related features proposed in the literature for closed-set camera model identification, however, under the more challenging open-set regime. We considered pairing feature vectors with different open-set classifiers exploring also the use of three alternatives of training protocols. All tests have been performed considering a selection of three independent image datasets freely available online comprising a large number of images from more than 300 camera models.
In terms of training protocols, we found out that employing extra known-unknown classes, as for networkopen approach, in general does not help on improving the performance of the classifiers compared to the simpler and cheaper employment of the open strategy. This result is interesting as it evinces that extra known-unknown classes, from the point of view of the network, are not required to be employed as its impact is limited. It means one can successfully train any open-set classifier, along with an open training protocol, with only the data available for the known classes. A better intuition on this behavior requires a deeper study on the network’s representation for unknown classes not employed on network training and those should be compared among the representation of each of the known classes employed for training the network, therefore, it remains as a future work.
Another evidence on the limited use of the known-unknown data from the point of view of the network were presented by employing a binary classifier for recognizing known versus unknown camera models: when a known-unknown set of data (from the unknown part of dresden) is employed to train this classifier, its performance on detecting unknown camera models from isa and flickr datasets is highly effected (Section V-D2). It also reinforces previous arguments on the open-set area that more theoretically-sounded and less data-relied solutions should be developed for general open-set problems [Scheirer2013].
Our results have shown that appropriate means of dealing with the open-set camera model attribution problem should be sought in order to properly handling the problem, considering that a recently proposed open-set method [Jain2014], as is, obtains considerable improved results compared to the straightforward idea of thresholding the softmax probability of neural networks for rejection as unknown (Section V-D1). This problem on thresholding the softmax probability for open-set recognition have been evinced in one of our previous work, hence the current work also confirms the previously more theoretical perspective [MendesJunior2018b, Chapter 7].
For the open-set camera model identification problem, a promising future research can be performed on investigating recently proposed alternatives to the softmax loss, e.g., the center loss [Wen2019], the angular softmax loss [Liu2017]
, etc., as the authors of those works have claimed improvement on the open-set face recognition problem.