The wide applications of partial differential equations (PDEs) in computer vision and image processing can be attributable to two main factors Tony:2005:PDEIP . First, PDEs in classical mathematical physics are powerful tools to describe, model, and simulate many dynamics such as heat flow, diffusion, and wave propagation. Second, many variational problems or their regularized counterparts can often be effectively solved from their Euler-Lagrange equations. Therefore, in general there are two types of methods for designing PDEs for vision tasks. For the first kind of methods, PDEs are written down directly (e.g., anisotropic diffusion Pietro:1990:PM , shock filter Osher:1990:ShockFilter , based on some understandings on the properties of mathematical operators and the physical natures of the problems, and curve-evolution-based equations Sapiro:2001:GPDE ). The second kind of methods basically define an energy functional and then derive the evolution equations by computing the Euler-Lagrange equation of the energy functional (e.g., total-variation-based variational methods Rudin:1992:ROF Chan:2005:TVL1 David:2003:TVEdge ). In either way, people have to heavily rely on their intuition on the vision tasks. Therefore, traditional PDE-based methods require good mathematical skills when choosing appropriate PDE forms and predicting the final effect of composing related operators such that the obtained PDEs roughly meet the goals. If people do not have enough intuition on a vision task, they may have difficulty in acquiring effective PDEs. For example, although there has been much work on PDE-based image segmentation Bresson:2007:TVSeg Chambolle:2005:TVLevelSet Li:2005:LevelSet Gao:11:levelset , the basic philosophy is always to follow the strong edges in the image and also require the edge contour to be smooth. Can we have a PDE system for objective detection (Fig. 1) that locates the object region if the object is present and does not respond if the object is absent? We believe that this is a big challenge to human intuition and is much more difficult than traditional segmentation tasks if a PDE-based method is required, because it is hard to describe an object class, which may have significant variation in shape, texture and pose. Without using additional information to judge the content, the existing PDEs for segmentation, e.g., Li:2005:LevelSet
, always output an “object region” for any non-constant image. In short, current PDE design methods greatly limit the applications of PDEs to a wider and more complex scope. This motivates us to explore whether we can acquire PDEs that are less artificial yet more powerful. In this paper, we give an affirmative answer to this question. We demonstrate that learning particular coefficients of a general intelligent PDE system from a given training data set might be a possible way of designing PDEs for computer vision in a lazy manner. Furthermore, borrowing this learning strategy from machine learning can generalize PDEs techniques for more complex vision problems.
Inspired by the electromagnetic field theory and Maxwell’s equations Cheng:1989:EM , we assume that the visual processing has two coupled evolutions in different scale spaces: one is in the image scale space, which controls the evolution of the output, and the other is in the indicator scale space that helps collect the global information to guide the evolution in the image scale space. In this way, our general intelligent PDE system consists of two coupled evolutionary PDEs. Both PDEs are coupled equations between the image and indicator, up to their second order partial derivatives. Another key idea of our general intelligent PDE system is to assume that the PDEs that are sought could be written as combinations of “atoms” which satisfy the general properties of vision tasks. As a preliminary investigation, we utilize all the translational and rotational invariants as such“atoms” and propose the general intelligent PDE system as a linear combination of all these invariants Olver:93:applications . Then the problem boils down to determining the combination coefficients among such “atoms”.
The theory of optimal control Kirk:1970:Opt has been well developed for over fifty years. With the enormous advances in computing power, optimal control is now widely used in multi-disciplinary applications such as biological systems, communication networks and socio-economic systems etc Ababnah:11:control
. Optimal design and parameter estimation of systems governed by PDEs give rise to a class of problems known as PDE-constrained optimal controlLions:1971:OptPDE . In this paper, a PDE-constrained optimal control technique as the training tool is introduced for our PDE system. We further propose a general framework for learning PDEs to accomplish a specific vision task via PDE-constrained optimal control, where the objective functional is to minimize the difference between the expected outputs and the actual outputs of the PDEs, given the input images. Such input-output image pairs are provided in multiple ways (e.g., ground truth, results from other methods or manually generated results by humans) for different tasks. Therefore, we can train the general intelligent PDE system to solve various vision problems which traditional PDEs may find difficult or even impossible.
In summary, our contributions are as follows:
Our intelligent PDE system provides a new way to design PDEs for computer vision. Based on this framework, we can design particular PDEs for different vision tasks using different sources of training images 111Similar idea also appeared in Liu:2010:LPDE . But in that work, the authors only train special PDEs involving the curvature operator for basic image restoration tasks. In contrast, our work here proposes a more unified and elegant framework for more problems in computer vision.. This may be very difficult for traditional PDE design methods. However, we would like to remind the readers that we have no intention to beat all the existing approaches for each task, because these approaches have been carefully and specially tuned for the task.
We propose a general data-based optimal control framework for training the PDE system. Fed with pairs of input and output images, the proposed PDE-constrained optimal control training model can automatically learn the combination coefficients in the PDE system. Unlike previous design methods, our approach requires much less human wits and can solve more difficult problems in computer vision.
The rest of the paper is structured as follows. We first introduce in Section 2 the general intelligent PDE system. In Section 3 we utilize the PDE-constrained optimal control technique as the training framework for our intelligent PDE system. Then in Section 4 we evaluate our intelligent PDE system with optimal control training framework by a series of computer vision and image processing problems. Finally, we give concluding remarks and a discussion on the future work in Section 5.
2 General PDE System for Computer Vision
2.1 Electromagnetic Field vs. Image Evolution
Electromagnetism is the force that causes the interaction among electrically charged particles. The areas in which electromagnetic interaction happens are called the electromagnetic fields. But in physics, electrically charged objects were first thought to produce two types of fields associated with their charge property: An electric field and a magnetic field. Over time, it was realized that the electric and magnetic fields are better thought of as two parts of a greater whole – the electromagnetic field Cheng:1989:EM . It affects the behavior of charged objects in the vicinity of the field. The electromagnetic field extends infinitely throughout space and describes the electromagnetic interaction. The theoretical implications of electromagnetism also led to the development of special relativity by Albert Einstein in 1905.
In this paper, inspired by this fundamental force of nature, we consider the image evolution in a similar way. For a target image signal , different from most traditional ways, which only consider the evolution in the image scale space, we define a companion signal named the indicator signal. It changes with time and guides the evolution of by collecting large scale information in the image. In this way, these two signals evolve in two coupled scale spaces.
2.2 The Intelligent PDE System
Similar to Maxwell’s equations Cheng:1989:EM , which are a set of PDEs describing how the electric and magnetic fields relate to their sources and how they develop with time, we propose a general PDE system for the evolution of our coupled signals.
The space of all PDEs is infinitely dimensional. To find the right form, we start with the properties that our PDE system should have, in order to narrow down the search space. We notice that translationally and rotationally invariant properties are very important for computer vision, i.e., in most vision tasks, when the input image is translated or rotated, the output image is also translated or rotated by the same amount. So we require that our PDE system is translationally and rotationally invariant. According to the differential invariant theory Olver:93:applications , the form of our PDEs must be functions of the fundamental differential invariants under the group of translation and rotation. The fundamental differential invariants are invariant under translation and rotation and other invariants can be written as their functions. We list those up to second order in Table 1, where some notations can be found in Table 2. In the sequel, we shall use to refer to them in order. Note that those invariants are ordered with going before . We may reorder them with going before . In this case, the -th invariant will be referred to as . So the simplest choice of our general PDE system is the linear combination of the differential invariants, leading to the following form:
is the rectangular region occupied by the input image , is the time that the PDE system finishes the visual information processing and outputs the results, and and are the initial functions of and , respectively. The meaning of other notations in (1) can be found in Table 2. For computational issues and the ease of mathematical deduction,
will be padded with zeros of several pixels width around it. As we can change the unit of time, it is harmless to fix. and are sets of functions defined on that are used to control the evolution of and , respectively. As and change to and , respectively, when the image is rotated by a matrix , it is easy to check the rotational invariance of those quantities. So the PDE system (1) is rotationally invariant. Furthermore, the following proposition implies that the control functions and can be functions of only.
Suppose the PDE system (1) is translationally invariant, then the control functions and must be independent of .
|An open bounded region in||Boundary of|
||, spatial variable||, temporal variable|
||The area of a region||
Transpose of matrix (or vector)
||norm||Trace of matrix|
||Gradient of||Hessian of|
||, index set for partial differentiation|
3 Training the PDE System via Data-based Optimal Control
In this section, we propose a data-based optimal control framework to train the intelligent PDE system for particular vision tasks.
3.1 The Objective Functional
Given the forms of PDEs shown in (1), we have to determine the coefficient functions and . We may prepare training samples , where is the input image and is the expected output image, and compute the coefficient functions that minimize the following functional:
where is the output image at time computed from (1) when the input image is , and and are positive weighting parameters. The first term requires that the final output of our PDE system be close to the ground truth. The second and the third terms are for regularization so that the optimal control problem is well posed, as there may be multiple minimizers for the first term.
3.2 Solving the Optimal Control Problem
Then we have the following optimal control problem with PDE constraints:
By introducing the adjoint equation of (4), the Gâteaux derivative of can be computed and consequently, the (local) optimal and can be computed via gradient-based algorithms (e.g., conjugate gradient). Here, we give the adjoint equation and Gâteaux derivative directly:
3.2.1 Adjoint Equation
3.2.2 Gâteaux Derivative of the Functional
With the help of the adjoint equation, at each iteration the derivative of with respect to and are as follows:
where the adjoint functions and are the solutions to (5).
Good initialization increases the approximation accuracy of the learnt PDEs. In our current implementation, we simply set the initial functions of and as the input image:
Then we employ a heuristic method to initialize the control functions. At each time step,is expected to be so that moves towards the expected output and by the form of (1) we may solve such that
is minimized222It is to minimize the difference between the left and the right hand sides of (1).. In this way, we initialize successively in time while fixing .
3.2.4 Finite Difference Method for Numerical Solution
To solve the intelligent PDE system numerically, we design a finite difference scheme Jain:77:partial for the PDEs. We discretize the PDEs, i.e. replace the derivatives , and with finite differences as follows:
The discrete forms of , and can be defined similarly. In addition, we discretize the integrations as
where is the number of pixels in the spatial area, is a properly chosen time step size and is the index of the expected output time. Then we use an explicit scheme to compute the numerical solutions.
3.3 The Optimal-Control-Based Training Framework
4 Experimental Results
In this section, we apply our data-based optimal control framework to learn PDEs for four groups of basic computer vision problems: Natural image denoising, edge detection, blurring and deburring, and image segmentation and object detection. As our goal is to show that the data-based optimal control framework could be a new approach for designing PDEs and an effective regressor for many computer vision tasks, NOT to propose better algorithms for these tasks, we are not going to fine tune our PDEs and then compare it with the state-of-the-art algorithms in every task.
4.1 Learning from Ground Truth: Natural Image Denoising
Image denoising is one of the most fundamental low-level vision problems. For this task, we compare our learnt PDEs with the existing PDE-based denoising methods, ROF Rudin:1992:ROF and TV- Chan:2005:TVL1 , on images with unknown natural noise. This task is designed to demonstrate that our method can solve problems by learning from the ground truth. This is the first advantage of our data-based optimal control model. We take images, each with a size of pixels, of objects using a Canon 30D digital camera, setting its ISO to . For each object, images are taken without changing the camera settings (by fixing the focus, aperture and exposure time) and without moving the camera position. The average image of them can be regraded as the noiseless ground truth image. We randomly choose objects. For each object we randomly choose noisy images. These noisy images and their ground truth images are used to train the PDE system. Then we compare our learnt PDEs with the traditional PDEs in Rudin:1992:ROF and TV- Chan:2005:TVL1 on images of the remaining objects.
Fig. 2 shows the comparison results. One can see that the PSNRs of our intelligent PDEs are dramatically higher than those of traditional PDEs. This is because our data-based PDE learning framework can easily adapt to unknown types of noise and obtain PDE forms to fit for the natural noise well, while most traditional PDE-based denoising methods were designed under specific assumptions on the types of noise (e.g., ROF is designed for Gaussian noise Rudin:1992:ROF while TV- is designed for impulsive noise Nikolova:2004:TVL1_imno ). Therefore, they may not fit for unknown types of noise as well as our intelligent PDEs. The curves of the learnt coefficients for image denoising are shown in Fig. 3.
4.2 Learning from Other Methods: Edge Detection
The image edge detection task is used to demonstrate that our PDEs can be learnt from the results of different methods and achieve a better performance than all of them. This is another advantage of our data-based optimal control model. For this task, we use three simple first order edge detectors Parker:1997:IPCV (Sobel, Roberts Cross, and Prewitt) to generate the training data. We randomly choose images from the Berkeley image database Martin:2001:BSDS and use the above three detectors to generate the output images333This implies that we actually use a kind of combination of the results from different methods to train our PDE system., together with the input images, to train our PDE system for edge detection.
Fig. 4 shows part of the edge detection results on other images in the Berkeley image database. One can see that our PDEs respond selectively to edges and basically produce visually significant edges, while the edge maps of other three detectors are more chaotic. Note that the solution to our PDEs is supposed to be a more or less smooth function. So one cannot expect that our PDEs produce an exactly binary edge map. Instead, an approximation of a binary edge map is produced. The curves of the learnt coefficients for edge detection are shown in Fig. 5.
4.3 Learning to Solve Both Primal and Inverse Problems: Blurring and Deblurring
The traditional PDEs for solving different problems are usually of very different appearance. The task of solving both blurring and deblurring is designed to show that the same form of PDEs can be learnt to solve both the primal and inverse problems. This is the third advantage of our data-based optimal control model.
For the image blurring task (the primal problem), the output image is the convolution of the input image with a Gaussian kernel. So we generate the output images by blurring high resolution images using a Gaussian kernel with . The original images are used as the input. As shown in the third row of Fig. 6, the output is nearly identical to the ground truth (the second row of Fig. 6). For the image deblurring task (the inverse problem), we just exchange the input and output images for training. One can see in the bottom row of Fig. 6 that the output is very close to the original image (first row of Fig. 6). The curves of the learnt coefficients for image deblurring are shown in Fig. 7.
4.4 Learning from Humans: Image Segmentation and Object Detection
Image segmentation and object detection are designed to demonstrate that our PDE system can learn from the human behavior directly (learn the segmentation and detection results provided by humans, e.g., manually segmented masks).
For image segmentation, it is a highly ill-posed problem and there are many criteria that define the goal of segmentation, e.g., breaking an image into regions with similar intensity, color, texture, or expected shape. As none of the current image segmentation algorithms can perform object level segmentation well out of complex backgrounds, we choose to require our PDEs to achieve a reasonable goal, namely segmenting relatively darker objects against relatively simple backgrounds, where both the foreground and the background can be highly textured and simple thresholding cannot separate them. So we select 60 images from the Corel image database Corel that have relatively darker foregrounds and relatively simple backgrounds, but the foreground is not of uniformly lower graylevels than the background, and also prepare the manually segmented binary masks as the outputs of the training images, where the black regions are the backgrounds (Fig. 8).
Part of the segmentation results are shown in Fig. 9, where we have set a threshold for the output mask maps of our learnt PDEs with a constant 0.5. We see that our learnt PDEs produce fairly good object masks. We also test the active contour method by Li et al. Li:2005:LevelSet 444Code available at http://www.engr.uconn.edu/cmli/ and the normalized cut method Shi:2000:Ncut 555Code available at http://www.cis.upenn.edu/jshi/software/. One can see from Fig. 9 that the active contour method cannot segment object details due to the smoothness constraint on the object shape and the normalized cut method cannot produce a closed foreground region. To provide a quantitative evaluation, we use the
-measures that merge the precision and recall of segmentation:
in which is the ground truth mask and is the computed mask. The most common choice of is 2. On our test images, the measures of our PDEs, Li:2005:LevelSet and Shi:2000:Ncut are , and , respectively. One can see that the performance of our PDEs is better than theirs, in both visual quality and quantitative measure. The curves of the learnt coefficients for image segmentation are shown in Fig. 10.
We also present the evolution process of the mask maps across time (Fig. 11). One can see that although the foreground is relatively darker than the background, the PDEs correctly detect the most salient points/edges and then propagate the information across the foreground region, resulting in a brighter output region for the foreground.
Now we apply our intelligent PDEs for a more complex task: Object detection. Namely, the PDEs should respond strongly to the object of interest while not responding (or responding much more weakly) if the object is absent in the image. It should be challenging enough for one to manually design PDEs to perform such a problem. As a result, we are unaware of any PDE-based method that can accomplish this task. The existing PDE-based segmentation algorithms always output an “object region” even if the image does not contain the object of interest. In contrast, we will show that as desired our PDEs are able to respond selectively. We choose the “plane” data set in Corel Corel . We select images from this data set as positive samples and also prepare images without the object of interest as negative samples. We also provide their ground truth object masks666For the positive samples, we manually segment binary masks as the output images. For the negative samples, the ground truth output masks are all-zero images. in order to complete the training data.
In Fig. 12, one can see that our learnt PDEs respond well to the objects of interest (first three images), while the response to images without the objects of interest is relatively low across the whole images (last two images). It seems that our PDEs automatically identify that the concurrent high-contrast edges/junctions/corners are the key features of planes. The above examples show that our learnt PDEs are able to differentiate the object/non-object regions, without requiring the user to teach them what features are and what factors to consider. The curves of the learnt coefficients for object detection are shown in Fig. 13.
In this paper, we have presented a framework for using data-based optimal control to learn PDEs as a general regressor to approximate the nonlinear mappings of different visual processing tasks. The experimental results on some computer vision and image processing problems show that our framework is promising. However, the current work is still preliminary, so we plan to improve and enrich our work in the following aspects. First, more theoretical issues should be addressed for this PDE system. For example, we will try to apply the Adomian decomposition method Wazwaz:2009:ADM to express the exact analytical solution to (1) and then analyze its physical properties. Second, we would like to develop more computationally efficient numerical algorithms to solve our PDE-constrained optimal control problem (4). Third, we will apply our framework to more vision tasks to find out to what extent it works.
Appendix A Proof of Property 2.1
We prove that the coefficients and must be independent of .
We prove for in (1) only. We may rewrite
Then it suffices to prove that is independent of .
By the definition of translational invariance, when changes to by shifting with a displacement , and will change to and , respectively. So the pair and fulfils (1), i.e.,
Next, we replace in the above equation with and have:
On the other hand, the pair also fulfils (1), i.e.,
Therefore, , that confines the input image inside . So is independent of . ∎
This work was partially supported by the grants of the National Science Foundation of China, No. U0935004 and 60873181.
- (1) T. Chen and J. Shen, Image processing and analysis: variational, PDE, wavelet, and stochastic methods. SIAM Publisher, 2005.
- (2) P. Pietro and M. Jitendra, “Scale-space and edge detection using anisotropic diffusion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 629–639, 1990.
- (3) S. Osher and L. Rudin, “Feature-oriented image enhancement using shock filters,” SIAM Journal on Numerical Analysis, vol. 27, pp. 919–940, 1990.
- (4) G. Sapiro, Geometric partial differential equations and image analysis. Cambridge University Press, 2001.
- (5) L. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D., vol. 60, pp. 259–268, 1992.
- (6) T. Chan and S. Esedoglu, “Aspects of total variation regularizaed function approximation,” SIAM Journal on Applied Mathematics, vol. 65, pp. 1817–1837, 2005.
- (7) D. Strong and T. Chan, “Edge-preserving and scale-dependent properties of total variation regularization,” Inverse Problems, vol. 19, pp. 165–187, 2003.
- (8) X. Bresson, S. Esedoglu, P. Vandergheynst, J.-P. Thiran, and S. Osher, “Fast gobal minimization of the active contour/snake model,” Journal of Mathematical Imaging and Vision, vol. 28.
A. Chambolle, “Total variation minimization and a class of binary MRF
Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), 2008.
- (10) C. Li, C. Xu, C. Gui, and M. Fox, “Level set evolution without re-initialization: a new variational formulation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
- (11) X. Gao, B. Wang, D. Tao, and X. Li, “A relay level set method for automatic image segmentation,” IEEE Transactions on Systems, Man and Cybernetics Part B: Cybernetics, vol. 41, pp. 518–525, 2011.
- (12) D. Cheng, Field and wave electromagnetics (2nd Edition). Prentics-Hall, 1989.
- (13) P. Olver, Applications of Lie groups to differential equations. Springer-Verlarg, 1993.
- (14) D. Kirk, Optimal control theory: an introduction. Prentice-Hall, 1971.
- (15) A. Ababnah and B. Natarajan, “Optimal control-based strategy for sensor deployment,” IEEE Transactions on Systems, Man and Cybernetics Part B: Cybernetics, vol. 41, pp. 97–104, 2011.
- (16) J. Lions, Optimal control systems governed by partial differential equations. Springer-Verlag, 1971.
- (17) R. Liu, Z. Lin, W. Zhang, and Z. Su, “Learning PDEs for image restoration via optimal control,” in European Conference on Computer Vision (ECCV), 2010.
- (18) A. Jain, “Partial differential equations and finite-difference methods in image processing, part 1,” Journal of Optimization Theory and Applications, vol. 23, pp. 65–91, 1977.
- (19) J. Stoer and R. Bulirsch, Introduction to numerical analysis (2nd Edition). Springer-Verlarg, 1998.
N. Mila, “A variational approach to reomve outliers and impulse noise,”Journal of Mathematical Imaging and Vision, vol. 20, pp. 99–120, 2004.
- (21) J. Parker, Algorithms for image processing and computer vision. John Wiley & Sons Inc, 1997.
- (22) D. R. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in International Conference on Computer Vision (ICCV), 2001.
- (23) Corel photo library, corel corp., Ottawa, Canada.
- (24) J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000.
- (25) A. Wazwaz, Partial differential equations and solitary waves theory. Springer-Verlag, 2009.