1 Introduction
Deep learning is a technology that has been shown to tackle many important problems in image processing and computer vision
[3]. However, all training needs a clear reference in order to apply neural networkbased techniques. Such a reference can either be a set of classes or a specific desired output in regression problems. However, there are also problems in which no clear reference can be given. An example for this are user preferences in forcedchoice experiments. Here, a user can only select the image he likes best, but he cannot describe or generate an optimal image. In this paper, we tackle exactly this problem by introduction of a user loss that can be generated specifically for one user of such a system.In order to investigate our new concept, we explore its use on image enhancement of interventional Xray images. Here, the problem arises that different physicians prefer different image characteristics during their interventions. Some users are distracted by noise and prefer strong denoising while others prefer crisp and sharp images. Another requirement for our user loss is that we want to spend only few clicks for training. As such we have to deal with the problem of having only few training samples, as we cannot ask your users to click more than 50 to 100 times. In order to still work in the regime of deep learning, we employ a framework coined precision learning that is able to map known operators and algorithms onto deep learning architectures [5]. In literature this approach is known to be able to reduce maximal error bounds of the learning problem and to reduce the number of required training samples [11]. Fu et al. even demonstrated that they are able to map complex algorithms such as the vesselness filter onto a deep network using this technique [2].
2 Methods
For this paper, we chose an Laplacian pyramid denoising algorithm as basis [8]. In this section first image denoising using the Laplacial pyramid is described. Then, we follow the idea of precision learning to derive the network topolgy based on the known approach followed by an detailed description of the loss function.
2.1 Subband decomposition
Image densoising using a Laplacian pyramid is carried out in two steps. First the image is decomposed into subbands followed by an soft threshold to reduce the noise. The Laplacian pyramid [8]
is an extension of the Gaussian pyramid using differences of Gaussians (DoG). To construct a layer of the Laplacian pyramid the input has to be blurred using a Gaussian kernel with a defined standard deviation
and mean with a subsequent subtraction from the unblurred input itself. This difference image is one layer in the Laplacian pyramid, while the blurred input image is downsampled by a defined factor serves as the input for the next layer. Repeating this Smoothing, Subtraction and Downsampling times constructs a pymarid of depth . The Gaussian parameters have to be defined for each layer, thus the construction of the pyramid can be described with:(1)  
(2) 
where is the input image for layer , the Gaussian kernel described by the standard deviation for the respective layer, the lowpass image, and is the bandpass image which represents the layer of the Laplacian pyramid.
2.2 SoftThresholding
After subband decomposition, we assume that small coefficients are caused by noise of different strength in each subband . Here, we employ a softthresholding technique to suppress this noise with magnitudes smaller than :
(3) 
Note that for both, the Gaussian that is used for the subband decomposition, as well as for the soft thresholding function subgradients [9] can be computed with respect to their parameters (cf. Fig. 1). As such both are suited for use in neural networks [5].
2.3 Neural Network
Following the precision learning paradigm, we construct a three layer Laplacian pyramid filter as a neural network. A flowchart of the network is depicted in Fig. 2. The lowpass filters are implemented as convolutional layers, in which the actual kernel only has a single free parameter . Using pointwise subtraction, these lowpass filters are used to construct the bandpass filters. On each of those filters, softthresholding with parameter is applies. In a final layer, the softthresholded bandpass filters are recombined to form the final image. As such we end up with a network architecture with nine layers that only has six trainable parameters
. In the following, we summarize these parameters as a single vector
that can be trained using the backpropagation algorithm [10].2.4 User Loss
Let be the user preferred image, the denoised image produced by our network. Below equation would be the main objective of our NN denoiser:
(4) 
The main problem with this equation is that the user is not able to produce . To resolve this problem, we introduce errors to the optimal image that cannot be observed directly:
However, if we provide a forcedchoice experiment using four images , we can determine which of the four errors is the smallest. This gives us a set of constraints that need to be fulfilled by our neural network. For the training of the network, we define our error in the following way:
Let be the total number of frames, denote the quality dedicated to frame , and denote the number of choices. Assuming is selected by the user, the following expected relationships between the errors emerge:
(5) 
For user selection is
, the constraint below are used to set up our loss function. Similar to implementation of support vector machines in deep networks, we map the inequality constraints to the hinge loss using the
operator [1].(6) 
This gives rise to three different variants of the user loss that are used in this work:

BestMatch: Only the user selected image is used to guide the loss function:
(7) 
ForcedChoice: The user loss seeks to fulfill all criteria imposed by the user selection.
(8) 
Hybrid: The user selected image drives the parameter optimization while all constraints implied by the forcedchoice are sought to be fulfilled.
(9)
Note that the hybrid user loss is mathematically very close to the softmargin support vector machine, where takes the role of the normal vector length and the role of the additional constraints.
3 Experiments and Results
For generating different scenarios, in the first step the Laplacian pyramid is initialized for each input image. Considering the center values of our parameter sets , the four different scenes are generated using random parameters. The resulting scenes for each frame are then imported to a GUI in order to take the user preferences (cf. Fig. 3).
The network is implemented in Python using Tensorflow framework. ADAM algorithm is used as optimizer iterating over 5000 epochs with learning rate of
and the batchsize is set to 50.The datasets which are used in this work are 2D angiography fluoroscopy image data. The dataset contains 50 images of size with different dose levels. We created 200 scenarios via randomly initializing the Laplacian pyramid parameters.Our dataset is divided such that 60% of the dataset for training data, 20% for validation and 20% for test set. In this work stratified KFold CrossValidation is used for data set splitting.
3.1 Qualitative Results
Qualitative results of our approach are presented in Fig. 4 for the first user. These indicate an influence of different loss functions on the parameter tuning of one user’s preferences. The Best Match loss shows better noise reduction, however reduces the sharpness more than the other losses. In contrast to Best Match, Forced Choice loss shows better sharpness and higher noise level. In order to favor both targets the Hybrid Loss eliminates noise and preserve sharpness of image data as well. Fig. 5 displays the Hybrid loss curves for our two different users over the training process. It demonstrates that User 1 favors sharper images than User 1. Note that we set a value of 100 as maximum for parameters .
3.2 Quantitative Evaluation
In this section, we evaluate the three loss functions for both of our users against each other. Table 1 displays the models created with the respective loss functions versus the test sets of both users. To set fair conditions for the comparision, we only evaluated models with the respective loss functions that were used in their training. The results indicate that BestMatch and ForcedChoice only are not able to result in the lowest loss for their respective user. The Hybrid loss models, however, are minimal on the test data of their respective user. Hence, the Hybrid loss seems to be a good choice to create userdependent denoising models.
Low dose data  User 1  User 2  

BM  FC  HY  BM  FC  HY  
Model Nr. 1  BM  1431.1  —  2436.7  —  —  
FC  —  248.8  —  253.1  —  
HY  —  —  1771.1  —  —  2675.9  
Model Nr. 1  BM  1381.5  —  —  2391.5  —  
FC  —  249.5  —  —  964.9  —  
HY  —  —  1781.1  —  —  2359.1 
4 Conclusion and Discussion
We propose a novel user loss for neural network training in this work. It can be applied to any image grading problem in which users have difficulties in finding exact answers. As a first experiment for the user loss, we demonstrate that it can be used to train a denoising algorithm towards a specific user. In our work 200 decisions using 50 clicks were sufficient to achieve proper parameter tuning. In order to be able to apply this for training, we used the precision learning paradigm to create a suitable network structure with only few trainable parameters.
Obviously also other algorithms would be suited for the same approach [12, 7, 4, 6, 8]. However, as the scope of the paper is the introduction of the user loss, we omitted these experiments in the present work. Further investigations on which filter requires how many clicks for convergence is still an open question and subject of future work.
We believe that this paper introduces a powerful new concept that is applicable for many applications in image processing such as image fusion, segmentation, registration, reconstruction, and many other traditional image processing tasks.
References

[1]
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). SpringerVerlag, Berlin, Heidelberg (2006)
 [2] Fu, W., Breininger, K., Schaffert, R., Ravikumar, N., Würfl, T., Fujimoto, J., Moult, E., Maier, A.: FrangiNet: A Neural Network Approach to Vessel Segmentation. In: Maier, A., T., D., H., H., K., M.H., C., P., T., T. (eds.) Bildverarbeitung für die Medizin 2018. pp. 341–346 (2018)
 [3] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436 (2015)
 [4] Luisier, F., Blu, T., Unser, M.: A new sure approach to image denoising: Interscale orthonormal wavelet thresholding. IEEE Transactions on image processing 16(3), 593–606 (2007)
 [5] Maier, A.K., Schebesch, F., Syben, C., Würfl, T., Steidl, S., Choi, J.H., Fahrig, R.: Precision learning: Towards use of known operators in neural networks. CoRR abs/1712.00374 (2017), http://arxiv.org/abs/1712.00374
 [6] Motwani, M.C., Gadiya, M.C., Motwani, R.C., Harris, F.C.: Survey of image denoising techniques. In: Proceedings of GSPX. pp. 27–30 (2004)
 [7] Petschnigg, G., Szeliski, R., Agrawala, M., Cohen, M., Hoppe, H., Toyama, K.: Digital photography with flash and noflash image pairs. In: ACM transactions on graphics (TOG). vol. 23, pp. 664–672. ACM (2004)
 [8] Rajashekar, U., Simoncelli, E.P.: Multiscale denoising of photographic images. In: The Essential Guide to Image Processing, pp. 241–261. Elsevier (2009)
 [9] Rockafellar, R.: Convex Analysis. Princeton landmarks in mathematics and physics, Princeton University Press (1970), https://books.google.de/books?id=1TiOka9bx3sC
 [10] Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by backpropagating errors. nature 323(6088), 533 (1986)
 [11] Syben, C., Stimpel, B., Breininger, K., Würfl, T., Fahrig, R., Dörfler, A., Maier, A.: Precision Learning: Reconstruction Filter Kernel Discretization. In: Noo, F. (ed.) Proceedings of the Fifth International Conference on Image Formation in XRay Computed Tomography. pp. 386–390 (2018)
 [12] Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Computer Vision, 1998. Sixth International Conference on. pp. 839–846. IEEE (1998)
Comments
There are no comments yet.