User Loss -- A Forced-Choice-Inspired Approach to Train Neural Networks directly by User Interaction

by   Shahab Zarei, et al.

In this paper, we investigate whether is it possible to train a neural network directly from user inputs. We consider this approach to be highly relevant for applications in which the point of optimality is not well-defined and user-dependent. Our application is medical image denoising which is essential in fluoroscopy imaging. In this field every user, i.e. physician, has a different flavor and image quality needs to be tailored towards each individual. To address this important problem, we propose to construct a loss function derived from a forced-choice experiment. In order to make the learning problem feasible, we operate in the domain of precision learning, i.e., we inspire the network architecture by traditional signal processing methods in order to reduce the number of trainable parameters. The algorithm that was used for this is a Laplacian pyramid with only six trainable parameters. In the experimental results, we demonstrate that two image experts who prefer different filter characteristics between sharpness and de-noising can be created using our approach. Also models trained for a specific user perform best on this users test data. This approach opens the way towards implementation of direct user feedback in deep learning and is applicable for a wide range of application.



There are no comments yet.


page 5

page 6


Lesion-Inspired Denoising Network: Connecting Medical Image Denoising and Lesion Detection

Deep learning has achieved notable performance in the denoising task of ...

Deep Saliency Models : The Quest For The Loss Function

Recent advances in deep learning have pushed the performances of visual ...

Dense-Sparse Deep CNN Training for Image Denoising

Recently, deep learning (DL) methods such as convolutional neural networ...

A Data-Adaptive Loss Function for Incomplete Data and Incremental Learning in Semantic Image Segmentation

In the last years, deep learning has dramatically improved the performan...

Deep Learning for Multi-Scale Changepoint Detection in Multivariate Time Series

Many real-world time series, such as in health, have changepoints where ...

Deep neural networks-based denoising models for CT imaging and their efficacy

Most of the Deep Neural Networks (DNNs) based CT image denoising literat...

A Divide-and-Conquer Approach towards Understanding Deep Networks

Deep neural networks have achieved tremendous success in various fields ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep learning is a technology that has been shown to tackle many important problems in image processing and computer vision

[3]. However, all training needs a clear reference in order to apply neural network-based techniques. Such a reference can either be a set of classes or a specific desired output in regression problems. However, there are also problems in which no clear reference can be given. An example for this are user preferences in forced-choice experiments. Here, a user can only select the image he likes best, but he cannot describe or generate an optimal image. In this paper, we tackle exactly this problem by introduction of a user loss that can be generated specifically for one user of such a system.

In order to investigate our new concept, we explore its use on image enhancement of interventional X-ray images. Here, the problem arises that different physicians prefer different image characteristics during their interventions. Some users are distracted by noise and prefer strong de-noising while others prefer crisp and sharp images. Another requirement for our user loss is that we want to spend only few clicks for training. As such we have to deal with the problem of having only few training samples, as we cannot ask your users to click more than 50 to 100 times. In order to still work in the regime of deep learning, we employ a framework coined precision learning that is able to map known operators and algorithms onto deep learning architectures [5]. In literature this approach is known to be able to reduce maximal error bounds of the learning problem and to reduce the number of required training samples [11]. Fu et al. even demonstrated that they are able to map complex algorithms such as the vesselness filter onto a deep network using this technique [2].

2 Methods

For this paper, we chose an Laplacian pyramid de-noising algorithm as basis [8]. In this section first image denoising using the Laplacial pyramid is described. Then, we follow the idea of precision learning to derive the network topolgy based on the known approach followed by an detailed description of the loss function.

2.1 Subband decomposition

Image densoising using a Laplacian pyramid is carried out in two steps. First the image is decomposed into subbands followed by an soft threshold to reduce the noise. The Laplacian pyramid [8]

is an extension of the Gaussian pyramid using differences of Gaussians (DoG). To construct a layer of the Laplacian pyramid the input has to be blurred using a Gaussian kernel with a defined standard deviation

and mean with a subsequent subtraction from the unblurred input itself. This difference image is one layer in the Laplacian pyramid, while the blurred input image is downsampled by a defined factor serves as the input for the next layer. Repeating this Smoothing, Subtraction and Down-sampling times constructs a pymarid of depth . The Gaussian parameters have to be defined for each layer, thus the construction of the pyramid can be described with:


where is the input image for layer , the Gaussian kernel described by the standard deviation for the respective layer, the low-pass image, and is the bandpass image which represents the layer of the Laplacian pyramid.

2.2 Soft-Thresholding

After sub-band decomposition, we assume that small coefficients are caused by noise of different strength in each sub-band . Here, we employ a soft-thresholding technique to suppress this noise with magnitudes smaller than :


Note that for both, the Gaussian that is used for the sub-band decomposition, as well as for the soft thresholding function sub-gradients [9] can be computed with respect to their parameters (cf. Fig. 1). As such both are suited for use in neural networks [5].

Figure 1: (a) Soft threshold and (b) Gaussian kernel used in this work.

2.3 Neural Network

Following the precision learning paradigm, we construct a three layer Laplacian pyramid filter as a neural network. A flowchart of the network is depicted in Fig. 2. The low-pass filters are implemented as convolutional layers, in which the actual kernel only has a single free parameter . Using point-wise subtraction, these low-pass filters are used to construct the band-pass filters. On each of those filters, soft-thresholding with parameter is applies. In a final layer, the soft-thresholded band-pass filters are recombined to form the final image. As such we end up with a network architecture with nine layers that only has six trainable parameters

. In the following, we summarize these parameters as a single vector

that can be trained using the back-propagation algorithm [10].

Figure 2: Schematic of the neural network design used in this work. The architecture mimics a Laplacian pyramid filter with soft-thresholding

2.4 User Loss

Let be the user preferred image, the denoised image produced by our network. Below equation would be the main objective of our NN de-noiser:


The main problem with this equation is that the user is not able to produce . To resolve this problem, we introduce errors to the optimal image that cannot be observed directly:

However, if we provide a forced-choice experiment using four images , we can determine which of the four errors is the smallest. This gives us a set of constraints that need to be fulfilled by our neural network. For the training of the network, we define our error in the following way:

Let be the total number of frames, denote the quality dedicated to frame , and denote the number of choices. Assuming is selected by the user, the following expected relationships between the errors emerge:


For user selection is

, the constraint below are used to set up our loss function. Similar to implementation of support vector machines in deep networks, we map the inequality constraints to the hinge loss using the

operator [1].


This gives rise to three different variants of the user loss that are used in this work:

  1. Best-Match: Only the user selected image is used to guide the loss function:

  2. Forced-Choice: The user loss seeks to fulfill all criteria imposed by the user selection.

  3. Hybrid: The user selected image drives the parameter optimization while all constraints implied by the forced-choice are sought to be fulfilled.


Note that the hybrid user loss is mathematically very close to the soft-margin support vector machine, where takes the role of the normal vector length and the role of the additional constraints.

3 Experiments and Results

For generating different scenarios, in the first step the Laplacian pyramid is initialized for each input image. Considering the center values of our parameter sets , the four different scenes are generated using random parameters. The resulting scenes for each frame are then imported to a GUI in order to take the user preferences (cf. Fig. 3).

Figure 3: Graphical user interface designed for proposed network training.

The network is implemented in Python using Tensorflow framework. ADAM algorithm is used as optimizer iterating over 5000 epochs with learning rate of

and the batchsize is set to 50.

The datasets which are used in this work are 2D angiography fluoroscopy image data. The dataset contains 50 images of size with different dose levels. We created 200 scenarios via randomly initializing the Laplacian pyramid parameters.Our dataset is divided such that 60% of the dataset for training data, 20% for validation and 20% for test set. In this work stratified K-Fold Cross-Validation is used for data set splitting.

3.1 Qualitative Results

Qualitative results of our approach are presented in Fig. 4 for the first user. These indicate an influence of different loss functions on the parameter tuning of one user’s preferences. The Best Match loss shows better noise reduction, however reduces the sharpness more than the other losses. In contrast to Best Match, Forced Choice loss shows better sharpness and higher noise level. In order to favor both targets the Hybrid Loss eliminates noise and preserve sharpness of image data as well. Fig. 5 displays the Hybrid loss curves for our two different users over the training process. It demonstrates that User 1 favors sharper images than User 1. Note that we set a value of 100 as maximum for parameters .

Figure 4: Comparison of original low-dose image and its corresponding results obtained from different user losses for the first user. For better visualization windowing is applied on the second row.
Figure 5: Comparison of Hybrid loss for two users. Note that we set 100 as maximal value for . User 1 favors sharper images while User 2 prefers smoother images.

3.2 Quantitative Evaluation

In this section, we evaluate the three loss functions for both of our users against each other. Table 1 displays the models created with the respective loss functions versus the test sets of both users. To set fair conditions for the comparision, we only evaluated models with the respective loss functions that were used in their training. The results indicate that Best-Match and Forced-Choice only are not able to result in the lowest loss for their respective user. The Hybrid loss models, however, are minimal on the test data of their respective user. Hence, the Hybrid loss seems to be a good choice to create user-dependent de-noising models.

Low dose data User 1 User 2
Model Nr. 1 BM  1431.1 2436.7
FC  248.8  253.1
HY  1771.1 2675.9
Model Nr. 1 BM 1381.5 2391.5
FC   249.5 964.9
HY  1781.1  2359.1
Table 1: Quantitative comparison of loss functions: Best-Match (BM), Forced-Choice (FC), Hybrid(HY)

4 Conclusion and Discussion

We propose a novel user loss for neural network training in this work. It can be applied to any image grading problem in which users have difficulties in finding exact answers. As a first experiment for the user loss, we demonstrate that it can be used to train a de-noising algorithm towards a specific user. In our work 200 decisions using 50 clicks were sufficient to achieve proper parameter tuning. In order to be able to apply this for training, we used the precision learning paradigm to create a suitable network structure with only few trainable parameters.

Obviously also other algorithms would be suited for the same approach [12, 7, 4, 6, 8]. However, as the scope of the paper is the introduction of the user loss, we omitted these experiments in the present work. Further investigations on which filter requires how many clicks for convergence is still an open question and subject of future work.

We believe that this paper introduces a powerful new concept that is applicable for many applications in image processing such as image fusion, segmentation, registration, reconstruction, and many other traditional image processing tasks.


  • [1]

    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg (2006)

  • [2] Fu, W., Breininger, K., Schaffert, R., Ravikumar, N., Würfl, T., Fujimoto, J., Moult, E., Maier, A.: Frangi-Net: A Neural Network Approach to Vessel Segmentation. In: Maier, A., T., D., H., H., K., M.H., C., P., T., T. (eds.) Bildverarbeitung für die Medizin 2018. pp. 341–346 (2018)
  • [3] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553),  436 (2015)
  • [4] Luisier, F., Blu, T., Unser, M.: A new sure approach to image denoising: Interscale orthonormal wavelet thresholding. IEEE Transactions on image processing 16(3), 593–606 (2007)
  • [5] Maier, A.K., Schebesch, F., Syben, C., Würfl, T., Steidl, S., Choi, J.H., Fahrig, R.: Precision learning: Towards use of known operators in neural networks. CoRR abs/1712.00374 (2017),
  • [6] Motwani, M.C., Gadiya, M.C., Motwani, R.C., Harris, F.C.: Survey of image denoising techniques. In: Proceedings of GSPX. pp. 27–30 (2004)
  • [7] Petschnigg, G., Szeliski, R., Agrawala, M., Cohen, M., Hoppe, H., Toyama, K.: Digital photography with flash and no-flash image pairs. In: ACM transactions on graphics (TOG). vol. 23, pp. 664–672. ACM (2004)
  • [8] Rajashekar, U., Simoncelli, E.P.: Multiscale denoising of photographic images. In: The Essential Guide to Image Processing, pp. 241–261. Elsevier (2009)
  • [9] Rockafellar, R.: Convex Analysis. Princeton landmarks in mathematics and physics, Princeton University Press (1970),
  • [10] Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. nature 323(6088),  533 (1986)
  • [11] Syben, C., Stimpel, B., Breininger, K., Würfl, T., Fahrig, R., Dörfler, A., Maier, A.: Precision Learning: Reconstruction Filter Kernel Discretization. In: Noo, F. (ed.) Proceedings of the Fifth International Conference on Image Formation in X-Ray Computed Tomography. pp. 386–390 (2018)
  • [12] Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Computer Vision, 1998. Sixth International Conference on. pp. 839–846. IEEE (1998)