Ordinal Regression using Noisy Pairwise Comparisons for Body Mass Index Range Estimation

11/08/2018
by   Luisa Polania, et al.
American Family Insurance
0

Ordinal regression aims to classify instances into ordinal categories. In this paper, body mass index (BMI) category estimation from facial images is cast as an ordinal regression problem. In particular, noisy binary search algorithms based on pairwise comparisons are employed to exploit the ordinal relationship among BMI categories. Comparisons are performed with Siamese architectures, one of which uses the Bradley-Terry model probabilities as target. The Bradley-Terry model is an approach to describe probabilities of the possible outcomes when elements of a set are repeatedly compared with one another in pairs. Experimental results show that our approach outperforms classification and regression-based methods at estimating BMI categories.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

10/14/2019

Robust Ordinal VAE: Employing Noisy Pairwise Comparisons for Disentanglement

Recent work by Locatello et al. (2018) has shown that an inductive bias ...
06/08/2018

Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information

In supervised learning, we leverage a labeled dataset to design methods ...
01/22/2013

Heteroscedastic Conditional Ordinal Random Fields for Pain Intensity Estimation from Facial Images

We propose a novel method for automatic pain intensity estimation from f...
10/30/2019

Tree-Structured Scale Effects in Binary and Ordinal Regression

In binary and ordinal regression one can distinguish between a location ...
05/06/2021

Ordinal UNLOC: Target Localization with Noisy and Incomplete Distance Measures

A main challenge in target localization arises from the lack of reliable...
05/27/2019

Deep ordinal classification based on cumulative link models

This paper proposes a deep convolutional neural network model for ordina...
03/25/2021

Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression

Uncertainty is the only certainty there is. Modeling data uncertainty is...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Body mass index is a biometric that provides information about health condition and is frequently employed as a measure to diagnose obesity [19]. It is defined as

(1)

The traditional way to measure BMI requires the presence of the subject and external elements, such as a measurement tape and a weight scale. Therefore, BMI measurement from facial images is of interest for many applications where there is no access to monitored measurement devices. For example, health-related analysis using profile images from social media  [21, 11], telemedicine kiosks to remotely diagnose patients [15]

and face recognition 

[23]. In the life insurance industry, BMI range estimation using face images has the potential to accelerate the underwriting process by alleviating the need for a medical exam. However, BMI estimation from face images is a challenging problem due to BMI distribution variations across races and ages [11], and due to the lack of information since a body-dependent measure is attempted to be estimated with only facial data.

Previous methods have been proposed for BMI estimation using facial images [22, 18, 12, 11]. In [22], active shape models were employed for extraction of geometric features and three regression methods were used for the BMI estimation. Similarly, in [12]

, facial fiducial points were calculated for feature extraction, using a small dataset of 1124 face images. Convolutional neural networks (CNNs) have also been used for BMI estimation. In 

[11], the VGG-Net model and the VGG-face model were used for feature extraction.

In this paper, the problem of BMI category estimation is addressed. The same BMI categorization proposed by the World Health Organization [8], as indicated in Table 1, is used. A multi-class classification approach is not recommended for this problem, since it assumes independence between the class labels, which is not true for BMI categories since they have a strong ordinal relationship. Instead, the BMI category estimation problem is cast as an ordinal regression problem and addressed using a Noisy Binary Search (NBS) [9] approach, where the goal is to insert the BMI associated to the test image into its proper place within the ordered sequence , defined by the boundaries of the BMI categories. Even though it was suggested in [9] that NBS algorithms could be employed for ranking problems, to the best of our knowledge, this is the first work that uses NBS for an ordinal regression application.

Noisy binary search relies on pairwise comparisons. Two deep learning-based models are used to make the comparisons. Both models consist of Siamese-type architectures 

[4]

. The differences between the architectures are related to the targets and the loss functions. One network uses binary classes as targets and the cross-entropy loss function. The other network uses the Bradley-Terry model probabilities 

[3] as target and the Euclidean loss function.

The contributions of this paper are as follows:

- The application of NBS algorithms, which outperform classification methods based on CNNs and handcrafted features, to the problem of BMI category estimation.

- Two Siamese-type architectures to calculate pairwise comparisons. The architectures include modifications with respect to the traditional Siamese architecture [11]

used to learn similarity metrics. For example, both architectures incorporate the dot product between image feature vectors to further exploit correlation between the inputs. In addition, one of the architectures uses the Bradley-Terry model probabilities 

[3] as target.

- To the best of our knowledge, this is the first work that addresses the BMI category estimation problem as an ordinal regression problem.

- This work uses the inmate active population dataset from the Florida Department of Corrections, which is the largest dataset that has ever been used for the problem of BMI estimation.

2 Background

This section describes the NBS problem and algorithms, the Siamese architecture typically used to learn similarity metrics, and the Bradley-Terry model.

2.1 Noisy Binary Search

The goal of NBS is the same goal of the traditional binary search problem of inserting an element into its proper place within an ordered sequence by comparing it with elements of the sequence. However, comparisons are noisy in NBS, i.e., gives the wrong result with a small probability. Therefore, each element of the sequence has an associated corresponding to the probability that . The empirical probability , a proxy for , is estimated by performing multiple comparisons between and .

The work in [9] illustrates the NBS problem with the coin flip model in which each element from the sequence is assigned a coin whose heads probability, , is unknown. However, the head probabilities are assumed to be ordered and we are allowed to toss the given coin to be able to estimate the empirical heads probability. If a coin is tossed times, then the probability that the estimated heads probability differs from by more than is bounded by a constant from Chernoff bound [9]. The problem is solved when a pair of consecutive coins, and , such that the interval contains the number , is found. In this paper, two algorithms to solve the NBS problem are considered, namely the Naive Noisy Binary Search (NNBS) and the Interval Noisy Binary Search (INBS) algorithms.

Category BMI range (kgm)
Underweight 16-18.5
Normal 18.5-25
Overweight 25-30
Moderately obese 30-35
Severely obese 35-40
Table 1: BMI categorization by the World Health Organization [8]

2.1.1 Naive Noisy Binary Search

The NNBS algorithm is recursive and resembles the traditional binary search algorithm. It maintains a set of indexes and which are initialized to and , respectively. At each iteration, it tests the sequence element midway between and , denoted as . If the calculated empirical probability associated to , denoted as , is within , then the algorithm returns . Otherwise, if , then index is updated with and remains unchanged. Similarly, if , then index is updated with and remains unchanged. The process repeats until either , in which case is returned, or until equals , in which case the algorithm returns . Details of the NNBS algorithm can be found in [9].

2.1.2 Interval Noisy Binary Search

Interval Noisy Binary Search modifies the NNBS algorithm by allowing backtracking [6]. It first builds a binary search tree of intervals such that the root node corresponds to sequence . Each non-leaf node interval has two children corresponding to the left and right halves of . The leaves of the tree are the intervals between consecutive sequence elements. The algorithm starts at the root of the binary search tree and at every non-leaf node corresponding to interval , it checks if the element to be searched, , belongs to by calculating the empirical probabilities associated to the sequence elements that define the boundaries of . If either the empirical probability of the left boundary is smaller than 0.5 or the empirical probability of the right boundary is greater than 0.5, the algorithm backtracks to the current node’s parent. Otherwise, if 0.5 lies within the empirical probability of the boundaries, the algorithm checks if belongs to the left or right child by calculating the empirical probability associated with the middle element of . If it is greater than 0.5, then it moves to the right child, otherwise, it moves to the left child. At a leaf node, the algorithm checks if belongs to the corresponding leaf interval by maintaining a counter. The counter increases by one if 0.5 lies within the probability of the leaf interval boundaries. Otherwise, the counter decreases by one. If the counter becomes less than 0, the algorithm backtracks to the leaf’s parent. The algorithm stops when the counter reaches a threshold and INBS returns the corresponding leaf node.

By following the above procedure, the algorithm may end up moving in a loop. If that is the case, the algorithm is run for a maximum of steps, saves all the visited sequence elements in a set , and runs NNBS on the set . Details of INBS can be found in [6].

2.2 Siamese architecture

Siamese convolutional networks have been widely employed to measure similarity in different applications, such as matching of image patches [24], face recognition [4] and signature verification [5]. All these problems fall into the category of matching problems. In  [14], Siamese networks were used to rank images in terms of image quality.

There are twin branches in a Siamese network that share the same architecture and the same set of weights. Let and denote the feature representations of the inputs provided by the last layers of the twin branches. Siamese architectures are typically trained with the contrastive loss function as follows

(2)

where is the set of feature representation pairs produced by the last layers of the Siamese network across all the inputs, are the weights of the network, is the label with 1 and 0 denoting a matching and a non-matching pair, respectively. The first term of the loss function penalizes matching pairs whose feature representations are far apart while the second term penalizes non-matching pairs whose feature representations are closer than a margin .

2.3 Bradley-Terry Model

The Bradley-Terry model is a probability model used to predict the outcome of a comparison [3]. Given a pair of individuals and drawn from some population, it estimates the probability that beats as , where and are positive real-valued scores associated to individual and , respectively. For example, may denote the probability that player will win a game against player and and may represent player strengths or abilities.

3 Method

This section describes the proposed method to address the problem of BMI category estimation. Given an image , the goal is to determine in which category from Table 1, the BMI associated to image belongs to. We propose to address this problem with an NBS approach.

3.1 Noisy binary search for BMI ordinal regression

Let be the sequence formed by the boundaries of the BMI categories. Each element of is represented by a pool of images, referred to as anchors, such that their BMI falls in the range , where is a small constant, and therefore, the pool of images have BMIs approximately equal to . The goal is to insert the BMI of , denoted as into its proper place within , by performing comparisons between and the anchors in an NBS fashion. For this purpose, a comparison operator is built such that it outputs 1 if it predicts that the anchor image has a BMI smaller or equal than and 0 otherwise. The comparison operator is noisy, i.e. it gives the wrong result with a small probability.

The proposed approach will be explained by using an analogy with the coin flip model presented in Section 2.1. The equivalent of flipping the coin assigned to is to randomly select an image from the anchors associated to and run it with through the comparison operator. The output of the operator, either 1 or 0, is the equivalent of the flip result, either head or tail. In Section 2.1, it was described that the coin assigned to was flipped multiple times to calculate the empirical probability . Similarly, the comparison operator is run with several randomly selected anchor images assigned to , to calculate the empirical probability that . Those probabilities are used by the NBS algorithms to predict the right place for within the order sequence , as explained in Section 2.1.

3.2 Comparison operator

This section presents comparison operators that are built with Siamese-type networks. The twin branches are truncated versions of traditional CNN architectures. Specifically, the AgeNet [13] and the VGG architecture, which was used in [11] for BMI regression, are employed.

AgeNet consists of 3 convolutional layers, three fully connected (FC) layers and a softmax layer. The first, second, and third convolutional layers contain 96 filters of size

, 256 filters of size , and 384 filters of size

, respectively. The first two FC layers contain 512 neurons each and the last layer contains 8 neurons. Truncated versions of the AgeNet architecture, built by excluding the softmax and the last FC layer, are used as the twin branches of the Siamese network. The motivation for using a small and simple architecture, such as AgeNet, is that learning to compare image pairs to predict which image has higher BMI is intuitively easier than learning the nominal BMI category.

The VGG architecture, which has more representational power than AgeNet, is also employed, at the expense of increasing the memory and computational cost. VGG contains 16 layers, 13 convolutional layers and 3 FC layers followed by a softmax. All the convolutional layers have a receptive field of size

and are followed by a ReLU layer. A stack of 13 convolutional layers are followed by three FC layers, where the first two have

channels each, and the third has a number of channels that depends on the classification task. Another configuration for the twin branches of the Siamese network used in this paper is a truncated version of the VGG architecture, built by excluding the softmax and 2 last FC layers.

Feature outputs from the twin branches are concatenated. To further exploit the correlation of the features, the dot product between the features is included in the concatenation vector. The dot product is a modification with respect to the traditional Siamese architecture [11]

. In the case of the AgeNet-based Siamese network, two FC layers follow the concatenation. The first and second FC layers contain 512 and 1 neurons, respectively. In the case of the VGG-based Siamese network, three FC layers follow the concatenation. The first and second FC layers contain 2048 and 1024 neurons, respectively, and are followed by ReLU and dropout layers each. For both networks, the last FC layer contains a single output which is fed to a sigmoid function. The architecture of the Siamese network is illustrated in Fig.

1.

The comparison operator can be represented with the function , where denotes the network parameters and and denote the inputs, which are the anchor image used for the th comparison and the given image, respectively. The function takes the value 1 when the network predicts that the BMI of is smaller or equal than the BMI of . Otherwise, it outputs 0. Let denote the number of comparisons used to calculate the empirical probability that , then the empirical probabilities that are fed to the NBS algorithms are defined by

(3)

3.2.1 Training modes

Two training modes for the Siamese network are used in this paper. Mode I uses the binary classes 1 and 0 as targets, where class 1 means that the BMI of the anchor input image is smaller or equal than and class 0 means otherwise. Mode I uses the cross-entropy loss function, which is defined as

(4)

where is the number of training image pairs, is the truth label for the th image pair and is the corresponding predicted probability. Mode II uses the Bradley-Terry model probabilities as target and the Euclidean loss function. The Bradley-Terry model is frequently used to model the outcome of games [3] and the motivation for using it in this paper is that comparison between a given image and the anchors associated to a sequence element have some similarity with the outcome of a game in the sense that the comparison operator predicts which subject from the input images wins at having higher BMI. The randomness in the outcome of a game comes from the fact that the same player may perform differently at different times, while the randomness of the comparison operator at predicting if comes from the fact that an anchor image is randomly selected from the pool of anchors at each comparison. The equivalent of the ability score associated to player is the BMI associated to the image. Therefore, the Bradley-Terry model probabilities, adapted to our problem, are defined by

(5)

where, as before, denotes the randomly selected image from the pool of anchor images associated to to perform the th comparison.

The output probabilities of the mode II-trained network are mapped to binary outputs by using the criteria that if the probability is greater or equal than 0.5, then . Otherwise, .

Figure 1: Schematic of the Siamese architecture

3.2.2 Training of the Siamese networks

The procedure to build the training pairs for the networks is as follows. The entire dataset is first divided into training, denoted as training dataset I, validation and testing datasets. Images whose BMI is within the range for each sequence element are extracted from the training dataset I to build the anchor dataset. Let training dataset II denote the remaining set of training dataset I after extracting the anchors. A training budget is assigned to each and represents the number of training pairs assigned to . A training pair is built by randomly selecting an image from the training dataset II and an anchor image. The budget is equally distributed among the anchors belonging to . The set of training pairs form the dataset used to train the Siamese networks. The same procedure is followed to build the validation pairs, but starting from the validation dataset.

For training the Siamese architectures, faces are first detected using the algorithm in [16] and cropped to the size . The top branches of the AgeNet-based and VGG-based Siamese networks are initialized with the weights of the original AgeNet [13] and VGG-Face models [17], respectively. The FC layers following the feature concatenation are initialized with the Xavier method [7]. For the AgeNet-based network, the Adam optimizer [10] with a base learning rate of and with default momentum values and is used for training with 300 samples per mini-batch. For the VGG-based network, the weights of the first 10 convolutional layers are kept frozen during training. Also, the Adam optimizer with a base learning rate of and with default momentum values and is used with 30 samples per mini-batch only due to memory constraints. The dropout probability is set to 0.5. Training stops when the loss on the validation pairs stops decreasing.

4 Experimental Results

In this section, we present details of the database, ordinal regression quality metrics, and experiments used to evaluate the proposed method (code is available at https://github.com/lfpolani/QI_ordinal_regression).

4.1 Database

The Florida Department of Corrections is a downloadable database featuring records for all the inmates currently incarcerated in the Florida state prison system [1]. The database contains weight and height information. The dataset is filtered to consider subjects with BMI in the range 16-40, which corresponds to the categories of interest. Samples outside this range are rare. For the experiments, 67756, 9045, and 20000 samples are randomly selected for training, validation and testing, respectively.

Fig. 2 shows randomly selected images for each BMI category. Note that distinguishing images from consecutive BMI categories is visually challenging; especially distinguishing between normal and overweight images, which correspond to nearly 75% of the entire dataset.

Figure 2: Image samples for each BMI category

4.2 Performance metrics

Three metrics commonly used for ordinal regression problems [20] are used to evaluate the performance of the proposed method, namely accuracy (ACC), Mean Absolute Error (MAE), and the Kendall rank correlation coefficient (). The accuracy is the fraction of correctly classified samples. Let be the labels of the underweight, normal, overweight, moderately obese, and severely obese BMI categories, respectively. Let and denote the ground truth and predicted label for the testing sample , respectively, and denote the total number of samples in the testing set. Then, the MAE is defined by . The Kendall rank correlation coefficient is used to measure the correlation between the rankings of the ground truth and predicted labels. It is defined as

(6)

where is the concordance indicator function, defined by

(7)

The Kendall rank correlation coefficient ranges from -1 to 1. For the perfect agreement and disagreement between the rankings of the ground truth and predicted labels, takes values 1 and -1, respectively. For completely independent rankings, has value 0.

For evaluating the performance of the Siamese networks, the accuracy and the area under the curve (AUC) are employed.

4.3 Performance of the Siamese architecture

This section evaluates the performance of the AgeNet-based Siamese network trained using the validation pairs described in Section 3.2.2. The training budget per sequence element is set to 150000. The value of needed to build the anchor set is set to 0.3. The classification threshold for the mode I-trained network is chosen such that the difference between the true positive rate and the false positive rate is maximized. This criteria leads to a classification threshold of 0.548. The classification threshold of the mode II-trained network is set to 0.5 to take advantage of the symmetry of the Bradley-Terry model around 0.5.

Table 2 shows the performance of the networks per sequence element. Note that sequence elements 16 and 40 are not included because the BMI of the dataset always lies between those two numbers due to pre-filtering. The mode II-trained network outperforms the mode I-trained network for all the different sequence elements, except for 25. For mode I, the ground truth of the validation pairs associated to are mostly 1 since most subjects have a BMI greater than 18.5, and therefore, it is expected that the ACC associated to 18.5 should be high. Similarly, the ACC for is expected to also be high since the validation pairs are highly unbalanced for that case as well. The AUC is known to be a better performance metric for unbalanced datasets [2]. Table 2 shows that the AUC attained by the Siamese networks mostly ranges from 0.7 to 0.8.

Sequence element Mode I Mode II
ACC AUC ACC AUC
18.5 0.997 0.692 0.997 0.798
25 0.71 0.725 0.694 0.712
30 0.748 0.729 0.762 0.745
35 0.933 0.775 0.946 0.788
Table 2: Performance of the AgeNet-based Siamese network

4.4 Performance of the Noisy Binary Search algorithms

The performance of both NNBS and INBS is evaluated in this section using the testing dataset. First, the performance of the algorithms is analyzed as a function of the comparison budget. Then, the NBS algorithms are compared with the AgeNet and VGG classification networks and with a handcrafted feature-based method. For the NNBS algorithm, the value of is set 0.03. The part of the INBS algorithm that runs the NNBS on also uses . The parameters and are set to and , where is the number of sequence elements.

Figure 3: Performance of the NNBS algorithm as a function of the comparison budget .
Figure 4: Comparison of the NNBS and INBS algorithms.

4.4.1 Performance evaluation as a function of the comparison budget

A budget is assigned to the NBS algorithm to perform pairwise comparisons. The fraction of the budget assigned to a sequence element , denoted as , depends on the performance of the Siamese network at comparing with that particular sequence element. Thus, is defined as

(8)

where is the AUC associated with the performance of the Siamese network at predicting if the BMI of a given image is greater or equal than (results shown in Table 2). Note that a budget is not assigned to and because the BMI of the testing samples always lies between and due to pre-filtering of the data.

Figure 3 illustrates the performance of the NNBS algorithm by varying the budget using the AgeNet-based Siamese network. The results in Figs. 3(a-c) correspond to the mean of the metrics across repetitions using different random anchor sampling at each repetition. Similarly, Figs. 3

(d-f) correspond to the standard deviation of the metrics across repetitions. The number of repetitions is set to 50. As expected, Figs.

3(a-c) show that the performance of the NNBS algorithm improves as the comparison budget increases. However, it improves very slowly for

. The mode II-trained network outperforms the mode I-trained network for the entire comparison budget range, which suggests that using the Bradley-Terry model probabilities is better than using binary labels as target. As the comparison budget increases, the variance of the predicted empirical probabilities across repetitions decreases, and therefore, the standard deviation of the performance metrics also decreases.

Figure 4 compares the performance of the NNBS and the INBS algorithms as a function of . Note that in the case of the INBS algorithm, the definition of the budget fraction in (8) applies at every step of the algorithm, but since backtracking may happen, the overall number of comparisons with may exceed . A smaller comparison budget implies more variance in the calculated empirical probabilities across comparisons, and therefore, for small values of , it is is expected that the INBS algorithm backtracks to the parent node more often to retry comparisons and improve performance than for large values of . This behavior reflects in Fig. 4(a), where the the INBS algorithm exhibits larger mean accuracy and smaller mean MAE than the NNBS algorithm for . Interval Noisy Binary Search also attains higher accuracy than NNBS for , but the difference is marginal for that case since there is less variance in the calculated empirical probabilities, and therefore, less backtracking. The MAE metric attained by INBS is smaller than that of NNBS for , but slightly bigger for . Regarding the metric, it achieves almost the same values for both algorithms when , but it is smaller for INBS than for NNBS when . An explanation for the behavior of the MAE and metrics for is that for a relatively large number of comparisons, less backtracking is expected and if the counter does not reach the threshold , the INBS reduces to NNBS applied to only the visited elements, instead of the entire sequence . From Fig. 4, we conclude that choosing INBS over NNBS is only worthy when a low comparison budget per iteration is required.

4.4.2 Increasing the representational power of the Siamese Network

Sequence element AgeNet-based VGG-based
ACC AUC ACC AUC
18.5 0.997 0.692 0.997 0.743
25 0.71 0.725 0.7 0.739
30 0.748 0.729 0.772 0.758
35 0.933 0.775 0.945 0.822
Table 3: Comparison between the AgeNet-based and VGG-based Siamese networks
Figure 5: Performance of the NNBS algorithm using the AgeNet-based and VGG-based Siamese networks

An experiment to compare the performance of the NNBS algorithm using the AgeNet-based and VGG-based Siamese networks is presented in this section. The training mode I is used for the experiment. The INBS is not included in this experiment because it was shown in Section 4.4 that it only outperforms NNBS when the comparison budget is low.

Table 3 compares the performance of the AgeNet-based and VGG-based Siamese networks at predicting if the BMI of a given image is greater or equal than a sequence element, using the validation pairs described in Section 3.2.2. The results indicate that the VGG-based network outperforms the AgeNet-based network, which is expected given that VGG has more representational power than AgeNet. Only for the sequence element 25, the ACC obtained with the VGG-based network is slightly smaller than the ACC of the AgeNet-based network.

Fig. 5 compares the performance of the NNBS algorithm using the AgeNet-based and VGG-based Siamese networks. As expected, using the VGG-based network in the NNBS algorithm leads to better results for the entire comparison budget range. For both networks, the performance improves slowly when the comparison budget exceeds 50.

4.4.3 Performance comparison with classification-based and regression-based approaches

In this section, the proposed method is compared with the AgeNet and VGG classification networks, with a handcrafted feature-based method, and with a VGG regression network followed by BMI category mapping. Comparison with previous results is not presented in this section because this is the first work that addresses BMI category estimation using the category definitions in Table 1.

The AgeNet and VGG classification networks employed in this experiment use the original AgeNet and VGG architectures presented in Section 3.2, respectively, with the exception of the last FC layer which uses 5 outputs. Similarly, the VGG regression network uses the same original VGG architecture presented in Section 3.2, with the exception of the last FC layer which uses 1 output and the softmax layer which is removed. The VGG regression network uses the Euclidean loss function instead of the cross-entropy loss function. The training procedure for the networks is similar to that described in Section 3.2.2. Faces are cropped to the size of . The AgeNet and VGG networks are initialized with the weights of the original AgeNet model [13] and VGG-Face model [17], respectively, except for the last FC layer, which is initialized with the Xavier method [7]. The same optimizer, base learning rate, batch size, dropout factor and training stopping criteria of the AgeNet-based and VGG-based Siamese networks are employed for the AgeNet and VGG classification and regression networks, respectively. The output of the VGG regression network is mapped to the corresponding BMI category using the definitions in Table 1.

The handcrafted feature-based method uses the same geometric features proposed in [22] for BMI regression. A multiclass linear-kernel SVM is employed for the BMI range estimation. The penalty parameter of the error term is estimated using grid search, which leads to .

Table 4 compares the performance of the NNBS and INBS algorithms using a budget with the performance of the AgeNet and VGG classification networks, with the handcrafted feature-based method, and with the VGG regression network followed by BMI category mapping. The value of is selected for the comparison since it corresponds to the best performance achieved by the NBS algorithms. Note that the metrics in Table 4 for the NBS algorithms correspond to the mean values across repetitions using different random anchor sampling at each repetition. Results show that NBS algorithms outperform classification-based methods by exploiting the ordinal nature of the BMI range estimation problem. The NBS algorithms using the AgeNet-based and VGG-based Siamese networks outperform the results of the AgeNet and VGG classification networks, respectively. Note that the metric results indicate that the rankings of the ground truth are much more correlated with the rankings of the labels predicted by the proposed methods than with the rankings of the labels predicted by the handcrafted feature-based method. The proposed methods also outperform the VGG regression network followed by BMI category mapping.

Method ACC MAE
Handcrafted
feature-based method
0.45 0.615 0.13
AgeNet (classification) 0.467 0.616 0.362
NNBS (AgeNet, Mode I) 0.473 0.598 0.38
NNBS (AgeNet, Mode II) 0.495 0.553 0.403
INBS (AgeNet, Mode II) 0.496 0.554 0.403
VGG (classification) 0.494 0.555 0.45
VGG (regression) +
BMI category mapping
0.481 0.589 0.382
NNBS (VGG, Mode I) 0.518 0.528 0.45
Table 4: Comparison of the NBS algorithms with classification-based methods

5 Conclusions

Noisy binary search has been studied extensively in the area of computer science. In this paper, it was shown how the ability of NBS algorithms to exploit the ordinal nature of a problem can be leveraged to address the problem of BMI range estimation. As NBS relies on pairwise comparisons, two Siamese networks were proposed to perform the comparisons. The motivation for using a method based on pairwise comparisons was that predicting which subject from two images has higher BMI is intuitively easier than learning the nominal BMI category.

References

  • [1] Florida department of corrections. http://www.dc.state..us/ActiveInmates/search.asp. Florida Department of Corrections database of inmate photos [Online database].
  • [2] U. Bhowan, M. Johnston, and M. Zhang.

    Developing new fitness functions in genetic programming for classification with unbalanced data.

    IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(2):406–421, 2012.
  • [3] R. Bradley and M. Terry. Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  • [4] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In

    IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    , volume 1, pages 539–546. IEEE, 2005.
  • [5] S. Dey et al. Signet: Convolutional siamese network for writer independent offline signature verification. arXiv preprint arXiv:1707.02131, 2017.
  • [6] M. Falahatgar, A. Orlitsky, V. Pichapati, and A. Suresh. Maximum selection and ranking under noisy comparisons. In

    Proceedings of the 34th International Conference on Machine Learning

    , volume 70 of Proceedings of Machine Learning Research, pages 1088–1096, International Convention Centre, Sydney, Australia, 06–11 Aug 2017.
  • [7] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In

    Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics

    , pages 249–256, 2010.
  • [8] W. Health Organization et al. Global database on body mass index: an interactive surveillance tool for monitoring nutrition transition. Public Health Nutr, 9(5):658–60, 2006.
  • [9] R. Karp and R. Kleinberg. Noisy binary search and its applications. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 881–890. Society for Industrial and Applied Mathematics, 2007.
  • [10] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [11] E. Kocabey et al. Face-to-bmi: Using computer vision to infer body mass index on social media. arXiv preprint arXiv:1703.03156, 2017.
  • [12] B. Lee, J. Jang, and J. Kim. Prediction of body mass index from facial features of females and males. International Journal of Bio-Science and Bio-Technology, 4(3):45–62, 2012.
  • [13] G. Levi and T. Hassner. Age and gender classification using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 34–42, 2015.
  • [14] X. Liu, J. van de Weijer, and A. D. Bagdanov. Rankiqa: Learning from rankings for no-reference image quality assessment. CoRR, abs/1707.08347, 2017.
  • [15] C. Lowe and D. Cummin. The use of kiosk technology in general practice. Journal of telemedicine and telecare, 16(4):201–203, 2010.
  • [16] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. In European Conference on Computer Vision, pages 720–735. Springer, 2014.
  • [17] O. Parkhi, A. Vedaldi, A. Zisserman, et al. Deep face recognition. In BMVC, volume 1, page 6, 2015.
  • [18] K. Ricanek and T. Tesafaye. Morph: A longitudinal image database of normal adult age-progression. In International Conference on Automatic Face and Gesture Recognition, pages 341–345. IEEE, 2006.
  • [19] A. Romero-Corral et al. Accuracy of body mass index in diagnosing obesity in the adult general population. International journal of obesity, 32(6):959–966, 2008.
  • [20] J. Sánchez-Monedero, P. Gutiérrez, P. Tiňo, and C. Hervás-Martínez. Exploitation of pairwise class distances for ordinal classification. Neural computation, 25(9):2450–2485, 2013.
  • [21] I. Weber and Y. Mejova. Crowdsourcing health labels: Inferring body weight from profile pictures. In Proceedings of the 6th International Conference on Digital Health Conference, pages 105–109. ACM, 2016.
  • [22] L. Wen and G. Guo. A computational approach to body mass index prediction from face images. Image and Vision Computing, 31(5):392–400, 2013.
  • [23] L. Wen, G. Guo, and X. Li. A study on the influence of body weight changes on face recognition. In IEEE International Joint Conference on Biometrics, pages 1–6. IEEE, 2014.
  • [24] S. Zagoruyko and N. Komodakis. Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4353–4361, 2015.