Barcodes for Medical Image Retrieval Using Autoencoded Radon Transform

09/16/2016 ∙ by Hamid R. Tizhoosh, et al. ∙ 0

Using content-based binary codes to tag digital images has emerged as a promising retrieval technology. Recently, Radon barcodes (RBCs) have been introduced as a new binary descriptor for image search. RBCs are generated by binarization of Radon projections and by assembling them into a vector, namely the barcode. A simple local thresholding has been suggested for binarization. In this paper, we put forward the idea of "autoencoded Radon barcodes". Using images in a training dataset, we autoencode Radon projections to perform binarization on outputs of hidden layers. We employed the mini-batch stochastic gradient descent approach for the training. Each hidden layer of the autoencoder can produce a barcode using a threshold determined based on the range of the logistic function used. The compressing capability of autoencoders apparently reduces the redundancies inherent in Radon projections leading to more accurate retrieval results. The IRMA dataset with 14,410 x-ray images is used to validate the performance of the proposed method. The experimental results, containing comparison with RBCs, SURF and BRISK, show that autoencoded Radon barcode (ARBC) has the capacity to capture important information and to learn richer representations resulting in lower retrieval errors for image retrieval measured with the accuracy of the first hit only.



There are no comments yet.


page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, the number of digital images in private and public archives has been steadily growing due to advances in imaging and communication technologies. When a query image is given, the purpose of “content-based image retrieval” (CBIR) is to find the most similar images in the database. Here, the general assumption is that images are not annotated with descriptive texts and keywords, or it is more desirable to search within the image due to domain requirements, e.g., in medical imaging. CBIR systems, hence, need to numerically characterize the image content somehow, i.e., through some type of feature extractions, in order to facilitate the search queries. More recently, CBIR methods have shifted toward “binary” descriptors. Binary information is compact, and its processing is fast. One of the most recent proposals is to use Radon barcodes. Based on a well-established transform with diverse applications, among others in medical imaging, one can assemble binary vectors by proper thresholding of Radon projections.

In this paper, we propose to train “autoencoders” to generate Radon barcodes. Presently, simple local thresholding is used to binarize Radon projections. This, as we will demonstrate, causes loss of information and hence an increase in retrieval error. By employing autoencoders we binarize the compressed version of Radon projections with less loss resulting in higher retrieval accuracy. As the Radon projections of neighbouring angles are highly redundant, the proposed approach constitutes a neural approach to redundancy reduction.

In the following sections, we first briefly review the relevant literature in Section 2. The proposed autoencoded Radon barcodes are described in Section 3. The dataset, error measurements, settings of the autoencoders, and the experimental results are reported in Section 4.

2 Background Review

In this section, we briefly review the relevant literature on image retrieval, autoencoders, and Radon barcodes.

Image Retrieval – Recent annotation or tagging methods for image retrieval have moved toward binary descriptors. Binary codes require less space and can be processed fast. Calonder et al. used binary strings and developed an efficient feature point descriptor called BRIEF [2]. Rublee et al. proposed another binary descriptor called ORB based on BRIEF, which should be rotation invariant as well as resistant to noise [14]. Leutenegger et al. proposed a novel method, named Binary Robust Invariant Scalable Keypoints (BRISK) for image key point detection, description and matching [12], and reported good performance at a low computational cost. In 2015, Radon barcodes based on the Radon transform was introduced [16]; good overall results for medical image retrieval on the IRMA dataset were reported. Recently, autoencoders have been increasingly used for image retrieval tasks [5, 3]

. Krizhevsky et al. first applied a deep convolutional neural network to the ImageNet LSVRC-2010 dataset and achieved the best result in that challenge


Autoencoders – The autoencoder is a type of artificial neural network which is trained to encode the input into some representations such that the input can be reconstructed from that representation [1]. An autoencoder transforms an input x

into hidden representation

y through a deterministic mapping


where is a non-linear function (e.g., Sigmoid), W is a weight matrix, and b is an offset vector of length . The prediction of original input can be reconstructed by mapping back the hidden representation:


Generally, the reconstruction z is not to be interpreted as exactly the same as the input x, “but rather in probabilistic terms as the parameters (typically the mean) of a distribution that may generate x

with high probability

[18]. In order to minimize the reconstruction error, the squared errors can be used:


Radon Barcodes – The idea behind Radon barcodes (RBC) [16, 15] is to apply Radon transform on an image and to binarize the projections in a proper way with minimum information loss. If the image is regarded as a 2D function , then one can project along a number of projection angles . The projection is basically the sum (integral) of values along lines constituted by each angle . The projection creates a new image with . Hence, using the Dirac delta function the Radon transform can be written as


It has been proposed to threshold all projections for each angle based on a “local” threshold for that angle, such that a barcode of all thresholded projections can be generated (Figure 1). Further, it has been proposed to use a typical value (e.g., median value of all non-zero values of each projection) as the threshold [16]. Algorithm 1 describes how Radon barcodes (RBC) are generated. In order to receive same-length barcodes Normalize resizes all images into images (i.e., ).

Figure 1: All projections (P1,P2,P3,P4) generated by the Radon transform are thresholded to generate code fragments C1,C2,C3,C4 resulting in a barcode [C1 C2 C3 C4] [Source: [16]].
1:  Initialize Radon Barcode
2:  Initialize angle and
3:  Normalize the input image
4:  Set the number of projection angles, e.g.
5:  while  do
6:     Get all projections for
7:     Find typical value
8:     Binarize projections:
9:     Append the new row
11:  end while
12:  Return
Algorithm 1 Radon Barcodes via thresholding [16]

The main difference between this work (described in the following section) and the original work on Radon barcodes in [16] is that we use the compressing capability of autoencoders in order to reduce the redundancies inherent in Radon projections. This clearly leads to, as we will report in the experiment section, more accurate retrieval results.

3 The Proposed Method

We first pre-process images to subsequently autoencode them with a suitable architecture and setting of the autoencoder.

Pre-Processing of Images – In order to generate same-length inputs for the autoencoder, all images in the training dataset are resized to fixed dimensions (e.g., ). Radon transform with selected number of projection angles is employed to extract the Radon features (i.e., the projections). This representation of each image is then normalized to only consist of values between 0 and 1 by dividing each element in the feature vector by the maximum value of that projection angle. The normalized Radon features can then be fed to the autoencoder.

Autoencoded Radon Barcodes –

After the autoencoder is trained, the barcode can be generated by feeding each normalized Radon feature vector to the autoencoder and looking at the output of each hidden layer (the value of the Sigmoid function of each neuron in that layer) that is passed on to the next layer. If this value is greater than 0.5, it becomes a “1” in the barcode and if it is less than 0.5, it becomes a “0”. The barcode can be generated for every hidden layer in the autoencoder.

Figure 2 illustrates the proposed method. Figure 3 shows four sample images from IRMA dataset with their Radon barcodes (with local thresholding) and the autoencoded Radon barcodes. For a better display, both barcodes are displayed with the same length (the length of autoencoded Radon barcodes are of the length of Radon barcodes).

Figure 2: Schematic illustration of the proposed method. The Radon projections () of the preprocessed image are autoencoded. For hidden layers, barcodes can be generated.
Figure 3: Visual comparison of Radon barcode (top) and autoencoder Radon barcode (bottom) for sample x-ray images from IRMA dataset. Images are normalized to with 16 projections.

Autoencoder Setting – We use the traditional autoencoder instead of the optimized one which trains fairly slowly and does not provide the most accurate results when compared to a much more advanced autoencoder 111We used a Python implementation (unoptimized) that can be found here: The standard learning algorithm for neural networks, known as mini-batch stochastic gradient descent [13]

is adopted. No regularization techniques were implemented. The weights of the autoencoder were initialized using a Gaussian distribution,

, over the square root of the number of weights connecting to the same neuron and the biases were initialized using a Gaussian distribution with mean 0 and standard deviation 1. We use the mean squared error as the cost function. Other parameters such as learning rate, mini-batch size and training epochs are set empirically as described in the next section.

4 Experimental Results

We first describe the IRMA dataset that we used for experimentation. The error measurement will be describe next. Subsequently, we report the parameter settings for the autoencoder. Finally, we analyze the performance of the proposed approach based on the experimental results. A comparison with SURF and BRISK methods will be reported at the end.

Image Test Data – The Image Retrieval in Medical Applications (IRMA) database222 is a collection of more than 14,000 x-ray images (radiographs) randomly collected from daily routine work at the Department of Diagnostic Radiology of the RWTH Aachen University333 [8, 17]

. All images are classified into 193 categories (classes) and annotated with the IRMA code which relies on class-subclass relations to avoid ambiguities in textual classification

[10, 9]. The IRMA code consists of four mono-hierarchical axes with three to four digits each: the technical code T (imaging modality), the directional code D (body orientations), the anatomical code A (the body region), and the biological code B (the biological system examined). The complete IRMA code subsequently exhibits a string of 13 characters, each in : TTTT-DDD-AAA-BBB. More information on the IRMA database and code can be found in [8, 9]. IRMA dataset offers 12,677 images for training and 1,733 images for testing. Figure 4 shows some sample images from the dataset along with their IRMA code in the format TTTT-DDD-AAA-BBB. IRMA x-rays images are a challenging benchmarking case mainly due to the imbalanced class distribution.

(a) 1121-127-700-500
(b) 1121-120-942-700
(c) 1121-120-200-700
(d) 1121-120-918-700
Figure 4: Sample images from IRMA Dataset with their IRMA codes TTTT-DDD-AAA-BBB.

Error Measurements – We used the evaluation scheme provided by ImageCLEFmed09 to compute the difference between the IRMA codes of the testing image and the first hit retrieved by the proposed approach. The total error for all test images can then be calculates as follows [8, 9]444We used the Python implementation provided by ImageCLEFmed09 to compute the errors according to this scheme:


Here, is an indicator to each image, is an indicator to the structure of an IRMA code, and refers to the number of characters in each structure of an IRMA code. For example, in the IRMA code: 1121-4a0-914-700, , , and . is an indicator to a character in a particular structure. Here, refers to the character “a” and refers to the character “7”. refers to the number of branches, i.e. number of possible characters, at the position in the structure in an IRMA code. refers to the testing image and refers to its top 1 retrieved image. compares a particular position in the IRMA code of the testing image and the retrieved image. It then outputs a value in {0, 1} according to the following rules:


Parameters for Autoencoder – In our experiments, we used the mean squared error as the quadratic cost function. The network was trained using stochastic gradient descent on training images for 300 epochs using a mini-batch size of 10 images. The learning rate was set to 0.5. No regularization techniques were implemented. Single hidden layer (shallow layer) and the multiple layers (deep layer) are both employed in the experiments. Each hidden layer reduces or increases the number of neurons from the previous layer by a factor of 2. For instance, in the case of the deep autoencoder with 3 hidden layers, it compresses the input by a factor of 4 in the second hidden layer and raises to the half size of input in the third layer (decoder).

Performance – To validate the performance of the proposed method, the Radon barcodes555Matlab code from are generated as well for comparing the error score computation according to Eq. 5. A total of 12,677 images in the training dataset are used to generate Radon barcodes (RBC) and the autoencoded Radon barcodes (ARBC). The remaining 1,733 images are used to retrieve the top hit image by generated barcode through -NN direct search (=1) with minimum Hamming distance.

Table 1 represents the error performance for RBC and ARBC with various normalized image sizes and number of projection angles . From Table 1, it can be clearly seen that although the length of the ARBC has been compressed, it is able to achieve better results than RBC. And generally, the ARBC compressed to half size works slightly better than the one compressed by a factor of 4. The increase in image size and the number of Radon projections seems to provide better results, but leads to a longer barcode.

8 16 8 16
RBC 605.83 576.45 585.76 559.46
ARBC 474.83 465.14 493.37 426.06
ARBC 477.27 475.14 477.32 440.69
Table 1: Error for RBC vs. ARBC with one hidden layer of or of size of input layer.

The ARBC generated by multi-layers autoencoders were also implemented. Table 2 shows error comparison among several ARBC versions generated from two shallow hidden layers (one in encoding and the other in decoding) and deep hidden layer. Each hidden layer reduces or increases the number of neurons from the previous layer by a factor of 2.

Hidden Layer 8 Projections 16 Projections 8 Projections 16 Projections
1st(encoder) 519.52 453.33 466.92 392.09
2nd(deep) 509.70 463.76 484.57 415.93
3rd(decoder) 679.54 623.16 624.23 583.21
Table 2: Error for different layers/settings for Radon transform when three hidden layers are used

It can be observed that for the multi-layer autoencoder, the performance is similar to the single layer autoencoder, namely that the error score drops proportional to the increase of the normalized image size and the number of projection angles. It seems that the number of projection angles has a greater impact on error performance than image downsampling size. Obviously, more projection angles bring additional image information. In contrast, the increase in image size (less downsampling) only enhances the information from the original image for certain angles. It can also be observed that the dataset generated by the shallow layer (the first hidden layer) performs slightly better than the deep layer one (the second hidden layer). However, it should be mentioned that the size of the barcode from the deep layer is half the size of the one in the shallow layer, which is the size of the Radon barcode.

8 16 8 16
RBC 0.006 0.006 0.006 0.007
ARBC 0.243 0.869 0.762 5.408
ARBC 0.139 0.455 0.431 1.843
ARBC_3 0.364 1.163 1.102 6.791
Table 3: Time comparison (in seconds)

Table 3 represents the processing time for generating the RBC and the proposed autoencoded Radon barcode (ARBC). The time for ARBC is only shown in one epoch and autoencoder with 3 hidden layers is denoted as ARBC_3 (1st hidden layer size: of RBC, 2nd : size of RBC, 3rd: size of RBC). From Tables 1 and 2, it can be seen that larger normalized image sizes and more projections do deliver better results in general. However, they also increase the computational costs. We used a MacBook Pro with a 2.5 GHz quad core intel core i7 (16 GB RAM). We did not use GPUs.

Barcodes versus SURF and BRISK – In this series of experiments, we also examined SURF [4] (as a non-binary method) and BRISK [11] (as a binary method). Using -NN as before was not an option because initial experiments took considerable time as SURF and BRISK appear to be slower than barcodes. Hence, we used locality-sensitive hashing (LSH) [6] to hash the features/codes into patches of the search space that may contain similar images666Matlab code: We made several tests in order to find a good configuration for each method. As well, the configuration of LSH (number of tables and key size for encoding) was subject to some trial and errors. We set the number of tables for LSH to 30 (with comparable results for 40) and the key size to a third of the feature vectors’ length. We selected the top 10 results of LSH and chose the top hit based on highest correlation with the input image for each method. The results are reported in Table 4.

As apparent from the results, the main drawback of both SURF and BRISK for medical image retrieval is that they do fail to extract descriptors for some images. BRISK, as a binary descriptor, has very high error rate. Although SURF has lower error than RBC, its non-binary descriptors does constitute a major disadvantage due to high storage requirement for big image data. ARBC, however, does provide the lowest error rate, and much lower storage needs.

Method Total Error Failure
SURF 526.05 4.6% (79 images)
BRISK 761.96 1.1% (19 images)
RBC 559.46 0.0%
ARBC 392.09 0.0%
Table 4: Comparing barcodes with SURF and BRISK.

5 Conclusions

Tagging images with binary descriptors seem to be a powerful approach to image retrieval for big image data. Radon barcodes, recently introduced, constitute an interesting framework for generating content-based binary features, probably a suitable approach for tagging medical images. Many aspects of Radon barcodes still require investigations. Among others, the method for binarizing Radon projections has a significant effect on the descriptiveness of the barcode. In this paper, we proposed autoencoded Radon barcodes. By making use of an autoencoder with 1 or 3 hidden layers, Radon barcodes can be generated via thresholding of compressed projections accessed at the output of hidden layers. Using IRMA dataset with 14,410 x-rays images, we demonstrated that thresholding via autoencoders is superior to local thresholding via median of non-zero projection values. The experimental results show that the barcodes generated by deeper autoencoder achieve better performance than shallow networks.


  • [1] Y. Bengio. Learning deep architectures for ai.

    Foundations and trends in Machine Learning

    , 2(1):1–127, 2009.
  • [2] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief: Binary robust independent elementary features. Computer Vision–ECCV 2010, pages 778–792, 2010.
  • [3] Z. Camlica, H. Tizhoosh, and F. Khalvati. Autoencoding the retrieval relevance of medical images. In Int. Conf. on Image Processing Theory, Tools and Applications, 2015.
  • [4] T. T. L. V. G. Herbert Bay, Andreas Ess. Surf: Speeded up robust features. Computer Vision and Image Understanding, 110(3):346–359, 2008.
  • [5] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.
  • [6] P. Indyk and R. Motwani.

    Approximate nearest neighbors: Towards removing the curse of dimensionality.


    Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing

    , pages 604–613, 1998.
  • [7] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural inf. proc. systems, pages 1097–1105, 2012.
  • [8] T. Lehmann, T. Deselaers, H. Schubert, M. Guld, C. Thies, B. Fischer, and K. Spitzer. The irma code for unique classification of medical images. In SPIE Proceedings, volume 5033, pages 440–451, 2003.
  • [9] T. Lehmann, T. Deselaers, H. Schubert, M. Guld, C. Thies, B. Fischer, and K. Spitzer. Irma – a content-based approach to image retrieval in medical applications. In IRMA International Conference, volume 5033, pages 911–912, 2006.
  • [10] T. M. Lehmann, H. Schubert, D. Keysers, M. Kohnen, and B. B. Wein. The irma code for unique classification of medical images. In Medical Imaging 2003, pages 440–451. International Society for Optics and Photonics, 2003.
  • [11] S. Leutenegger, M. Chli, and R. Siegwart. Brisk: Binary robust invariant scalable keypoints. In IEEE International Conference on Computer Vision, pages 2548–2555, 2011.
  • [12] S. Leutenegger, M. Chli, and R. Y. Siegwart. Brisk: Binary robust invariant scalable keypoints. In IEEE International Conference on Computer Vision, pages 2548–2555, 2011.
  • [13] M. Li, T. Zhang, Y. Chen, and A. J. Smola. Efficient mini-batch training for stochastic optimization. In International Conference on Knowledge Discovery and Data Mining, pages 661–670, 2014.
  • [14] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. Orb: an efficient alternative to sift or surf. In IEEE International Conference on Computer Vision, pages 2564–2571. IEEE, 2011.
  • [15] H. Tizhoosh, M. Gangeh, H. Tadayyon, and G. Czarnota.

    Tumour ROI estimation in ultrasound images via Radon barcodes in patients with locally advanced breast cancer.

    In International Symposium on Biomedical Imaging, 2016.
  • [16] H. R. Tizhoosh. Barcode annotations for medical image retrieval: A preliminary investigation. arXiv preprint arXiv:1505.05212, 2015.
  • [17] T. Tommasi, B. Caputo, P. Welter, M. O. Güld, and T. M. Deserno. Overview of the clef 2009 medical image annotation track. In Multilingual Information Access Evaluation II, pages 85–93. Springer, 2010.
  • [18] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol.

    Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.

    The Journal of Machine Learning Research, 11:3371–3408, 2010.