The last fifty years have seen an impressive development of mathematical methods for the analysis and processing of digital images, mostly in the context of photography, biomedical imaging and various forms of engineering. The arts have been mostly overlooked in this process, apart from a few exceptional works in the last ten years. With the rapid emergence of digitisation in the arts, however, the arts domain is becoming increasingly receptive to digital image processing methods and the importance of paying attention to this therefore increases. In this paper we discuss a range of mathematical methods for digital image restoration and digital visualisation for illuminated manuscripts. The latter provide an interesting opportunity for digital manipulation because they traditionally remain physically untouched. At the same time they also serve as an example for the possibilities mathematics and digital restoration offer as a generic and objective toolkit for the arts.READ FULL TEXT VIEW PDF
Image processing can be used for digital restoration of ancient papyri, ...
In the words of the twentieth-century British mathematician G. H. Hardy,...
Explaining Cybersecurity with Films and the Arts...
Digital Geometry software should reflect the generality of the underlyin...
Transform image processing methods are methods that work in domains of i...
There are many resources useful for processing images, most of them free...
Creativity and the understanding of cognitive processes involved in the
Text and results for this section, as per the individual journal’s instructions for authors.
The digital processing, analysis and archiving of databases and collections in the arts and humanities is becoming increasingly important. This is because of a myriad of possibilities that digitisation opens up that go well beyond the organisation and manipulation of the actual physical objects, allowing, for instance, the creation of digital databases that are searchable with respect to several parameters (keywords), the digital processing and analysis of objects that are non-destructive to the original object, and the application of automated algorithms for sorting newly found objects into existing digital databases by classifying them into pre-defined groups in the database. These possibilities go hand-in-hand with ever-growing advances in data science that are developing mathematical methodology for analysing and processing digital data. A large component of digital data in the arts and humanities is composed of digital images. Despite many developments of mathematical image analysis methods in applications in biomedicine, the physical sciences and various forms of engineering, the arts and humanities have been, apart from only a few exceptions and more recent works of, for instance, Daubechies and co-authors[polatkan2009detection, ruvzic2011virtual, anitha2013restoration, cornelis2013crack, pizurica2015digital, yin2016removing], Fornasier [fornasier2005fast, fornasier2007restoration], Abry, Jaffard and co-authors [Wendt2013a, abry2015multiscale], and [kirchner2018digitally, kirchner2018digitally2, kirchnerdigitally3, van2017learned, van2015toward], have been mostly overlooked as an application in need of bespoke mathematical image analysis methods.
In this work we discuss a range of mathematical methods for automated digital restoration based on partial differential equations, exemplar-based image inpainting and osmosis filtering, and, by examples, their translation to the digital restoration and interpretation of illuminated manuscripts. The pre-sequel of this article is an article in the exhibition catalogue[panayotova2016].
Medieval and Renaissance illuminated manuscripts present a particular challenge, but also an opportunity to transform current understanding of European visual culture between the 6th and 16th century. Illuminated manuscripts are the largest and best preserved resource for the study of European painting before 1500. Nevertheless, the images in some manuscripts have been affected by wear-and-tear, degradation over time, iconoclasm, censorship or updating. Unlike the conservation of other painted artefacts, the conservation of illuminated manuscripts preserved in institutional collections is non-invasive, usually restricted to repairs of the binding and of torn parchment or paper, and rarely involves the consolidation of flaking pigments. Damaged areas in the images are never restored; pigment losses are never filled in; over-painting added on top is never removed to reveal the original images. This minimal approach is due largely to the fact that when compared to wall or easel paintings, the images in illuminated manuscripts are relatively small and their pigment layers are few and very delicate. It is not possible to remove over-painting without damaging or completely removing the original painting beneath. The removal of even the smallest sample or the restoration of even the smallest painted area would constitute a considerable change to the overall image. Virtual restoration is thus the only way to recover damaged illuminations, whether by infilling paint losses or by removing over-painted layers or indeed both. Bringing the images as close as possible to their original form would ensure both their accurate scholarly interpretation and their full appreciation by wider audiences. Damaged or inaccurately restored illuminations can lead to the exclusion of seminal works of art from academic debates or to incomplete and misleading interpretations of the dating, origin and artists involved. Preserving the current state of the illuminations in line with conservation ethics, faithful digital restoration would serve as a reliable surrogate for multiple reconstructions, enabling research, teaching and wider appreciation for manuscripts.
Reliable digital restoration for illuminated manuscripts requires a multi-disciplinary collaboration as the current work is based on. In what follows we discuss a range of new adaptive, semi-automated restoration methods that (a) reconstruct image-structures using partial differential equations [bertalmio2000image, masnou1998level, Bornemann2007, Baatz2008, Baatz2009, Burger2009, CarolaBook], (b) mimic the human-expert behaviour, using texture- and structure patches sampled from the intact part of the illuminated manuscript at hand and integrating them in exemplar-based inpainting approaches [Criminisi2004, Arias2011], (c) exploit infrared imaging data, correlating the visible image content with its traces in the hidden layers of paint [weickert2013linear, panayotova2016], and (d) create new 3D interpretations of illuminated manuscripts through a new 3D conversion pipeline [Guidefill].
In Section 2 we propose a semi-supervised approach for the segmentation of damaged areas in manuscripts and for the retrieval of missing information within them via an exemplar-based image inpainting model. In section 3 we consider the mathematical model of image osmosis to integrate super-painted visible image information on a manuscript with hidden infrared ones for looking through the layers of a restoration process. Finally, in Section 4 we present a mathematical pipeline to convert a 2D painting into a 3D scene by means of the construction of an appropriate depth map.
The problem of image inpainting can be described as the task of filling in the damaged (or occluded) areas in an image defined on a rectangular domain by transferring the information available in the visible areas to the damaged areas in the image. Over the last thirty years a large variety of mathematical models solving the image inpainting problem have been proposed, see, e.g., [ChanShenBook, CarolaBook] for a review. In some of them, image information is transferred into the damaged areas (the so-called inpainting domain, denoted by in the following) by using local
information only, i.e. by means of suitable diffusion and transport processes which interpolate image structures in the immediate vicinity of the boundary ofinside the occluded region. Such techniques have been shown to be effective for the transfer of geometrical image structures, even in the presence of large damaged areas [CarolaBook]. However, because of their local nature, such methods do not make use of the entire information contained in the intact image regions. In particular, such methods do not take into account image information located far apart of
to estimate the missing contents. For such a reason, more sophisticated (non-local) mathematical models exploiting self-similarities in the whole image at hand have been proposed[Criminisi2004, Arias2011, Newson2014]
. Heuristically, they can be thought as copy and paste models where small patches insideare iteratively reconstructed by comparison with patches outside in terms of a suitable distance. These models have been proven to be impressively effective in a very large variety of applications and rendered computationally feasible in recent years with the well-known PatchMatch algorithm [Barnes2009].
The very first basis step of any inpainting algorithm consists in the localisation of the damaged areas. This is essentially an image segmentation problem which may be rendered very hard in the presence of fuzzy and irregular region boundaries and small scale objects.
In the following we describe a semi-supervised algorithm for the detection of the damaged areas in images with possibly large and non-homogeneous missing regions. This is then used as a necessary initial step for the subsequent application of the exemplar-based non-local image inpainting model proposed in [Newson2014] to the reconstruction of image contents in two highly-damaged images 1 of illuminated manuscripts.
Our dataset is made by two manuscripts, reported in Figure 1: William de Brailes, Last Judgement (left) and Christ in Majesty with King David playing the harp (right), Fitzwilliam Museum, MSS 330.iii and 330.v, England, Oxford, c. 1230-1250. The images to be processed are very large .tif data of pixels and approximatively MB each.
In order to identify the damaged areas in the image, we propose in the following a combined algorithm where a classical binary segmentation model is used for the extraction of a small training region which serves as an input for a subsequent labelling algorithm which segments the whole inpainting domain in terms of appropriate intensity-based image features.
A binary image segmentation task consists in the partitioning of an image into two different regions characterised each by distinctive features. Typically, RGB intensity values are used to describe image contents and mathematical image segmentation methods compute the required segmented image as the minimiser of an appropriate functional where information on the magnitude of the gradient of the image are encoded in order to identify the regions (boundaries) where intensity values drastically change.
If by simplicity one assumes that the image at hand can be approximated by a binary function so that
where is a closed curve, a very well-known model computing the segmentation of in two classes is the Chan-Vese model presented in [ChanVese2001]. For such modelling, the functional to minimise seeks the optimal and , i.e. the optimal of the form (1), and contour to minimise
where and denote the inner and the outer part of , respectively. In (2.2.1) the first term penalises the length of , ensuring its regularity. The size of is controlled by the second term which is penalising the area in the interior of , while the two other terms penalise the discrepancy between the fitting of the piecewise constant with the model (1) and the given image in the interior and exterior of , respectively. By computing the local minima of (2.2.1) one retrieves the optimal binary approximations of .
Despite being very popular and widely used in applications, the Chan-Vese model and its extensions present intrinsic limitations. Firstly, the segmentation result is strongly dependent on the initialisation: in order to get a good result, the initial condition needs to be chosen within (or sufficiently close to) the domain one aims to segment. Secondly, due to the modelling assumption (1), the Chan-Vese model works well for images whose intensity is locally homogeneous. If this is not the case, the contour curve may evolve along image information different from the one we want to detect. Images with significant presence of texture, for instance, can exhibit similar problems. Furthermore, the model is very sensitive to the length and area parameters and , which may make the segmentation of very small objects in the image very difficult.
For our application, we make use of the Chan-Vese model to segment a sub-region included in out of the whole image and with a high level of precision 111For our computation we used the inbuilt MATLAB Chan-Vese segmentation code.. To do that, we ask the user (typically, an expert in the field) simply to click in few pixels of the inpainting domain to identity a candidate initial condition for the segmentation model (1), which is then run to segment the subregion . In Figure 2 and 3 we show the results with a superimposed mask of the computed region for some details cropped from the original images.
Because of the intrinsic limitations of the Chan-Vese approach, we observe that the segmentation result is not satisfactory (see, for instance, Figure 2) since it generally detects with high precision only the largest uniform region around the user selection. To detect the whole inpainting domain , the user should give in principle many initialisation points, which may be very demanding in the presence of several disconnected and possibly tiny inpainting regions.
For this reason, we proceed differently and make use of a feature-based approach to use the area as a training region for a classification algorithm running over the whole set of image pixels.
We now extract some intensity-type features from the image in order to build feature vectors which will be then used for classification. Namely, for every pixel
in the image we apply non-linear channel transformations to compute the HSV (Hue, Saturation, Value), the geometric mean chromaticity GMCR[Finlayson2009], the CIELAB and the CMYK (Cyan, Magenta, Yellow, Key) values (see [wyszecki2000color] for more details). Once this is done, we append all these values together so that multi-dimensional feature vector of the form
is built. Note that having only the RGB intensity data at hand, unfortunately not much information can be added when building the feature vectors. In this respect, if one would have multi-spectral measurements (such as IR, UV or other types) additional features extracted from the complementary data could be added for an improved result.
Once the feature vectors are built for every point in the image, we use the training region detected as described in Section 2.2.1 as a dictionary to drive the segmentation procedure extended to the whole image domain. We proceed as follows. First, we run a labelling algorithm over the whole image domain comparing appropriately the features defined in (3) in order to partition the image in a fixed number of classes. To do that, we use the well-known -means algorithm. After this preliminary step, we check which label has been assigned to the training region and simply identify in the labelled image which pixels share the same label. By construction, this corresponds to find the regions in the image ‘best-fitting’ the training region in terms of the features defined in Section 2.2.2, which is our objective. After a refinement step for the removal of possibly small details due to noise and/or misclassification, we can finally extract the whole area to inpaint . We report the results corresponding to Figure 2 and Figure 3 in Figure 3(a) and 3(b), respectively.
Note that in order to reduce the computational times, our results are run not on the whole image, but only on details cropped by the user as a very first step. Additionally, in order to provide a suitable initialisation for the Chan-Vese segmentation, the user (typically, an expert) is required to select few ‘training’ pixels in the inpainting domain to guide the first segmentation procedure. As such, the method is semi-supervised.
Once an accurate segmentation of the damaged areas is provided, the task becomes the actual restoration of the image contents in by means of the available information in the region . The standard mathematical approach solving an inpainting problem consists in minimising an appropriate function defined over the image domain , i.e. in solving
A standard choice for in the case of local inpainting models is the functional
where denotes the given image to restore, is the Euclidean norm and
denotes the characteristic function of the unoccluded image areas, so that for every pixel:
Because of this modelling, the second term in (5) can be interpreted as a distance function between the given image in the intact and the unknown image to reconstruct. Indeed, the multiplication by the characteristic function implies that such term is simply zero for the points in , since there no information is available, while the contribution of all the points in has to be as small as possible. The term typically encodes local information (such as gradient magnitude) which is the responsible of the transfer of information inside by means of possibly non-linear models[ChanShenBook, CarolaBook]. The transfer process is balanced with the trust in the data by the positive parameter . A classical choice of a gradient-based inpainting model consists in choosing
i.e. the Total Variation of [ShenChan2002]. As mentioned above such an image inpainting technique is not designed to transfer texture information. Furthermore, it fails in the inpainting of large missing areas. For our purposes we use (6) as an initial ‘good’ guess with which we initialise a different approach based on a local copy-and-paste procedure as described in the following section.
We describe here the non-local patch-based inpainting procedure studied in [Arias2011, Newson2014] and carefully described in [NewsonIPOL] from an implementation point of view 222The code is freely available at IPOL: https://doi.org/10.5201/ipol.2017.189. In the following, we define for any point the patch neighbourhood as the set of points in in a neighbourhood of . Assuming that the patch neighbourhood has cardinality , by patch around we denote the -dimensional vector where the points belong to patch neighbourhood . In order to measure ‘distance’ between patches, a suitable patch measure can be defined, so that stands for the patch measure between the patches around the two points and . We define then the Nearest Neighbour (NN) of as the patch around some point minimising .
For an inpainting application the task consists then in finding for each point in the inpainting domain the best-matching patch outside . Assuming that each NN patch can be characterised in terms of a shift vector defined for every point in (i.e. assuming there exists a rigid transformation which shifts any patch to its NN), the problem can be formulated as the minimisation problem
Heuristically, every patch in the solution of the problem above is constructed in such a way that in the damaged region has a correspondence (in the sense of the measure ) with its NN patch in the intact region . Following [NewsonIPOL], we use the following distance:
From an algorithmic point of view, solving the model above corresponds to consider mainly two phases: the former consists in computing (approximately) the NN patch for each point in , so as to provide a complete representation of the shift map This can be computationally expensive for large image data. In order to solve this efficiently, a PatchMatch [Barnes2009] strategy can be applied. Afterwards a proper image reconstruction step is performed, where for every point in the actual corresponding patch is computed. We refer the reader to [NewsonIPOL] for the numerical details.
A crucial ingredient for a good performance of the non-local algorithm is its initialisation. In particular, once the inpainting domain is known, a pre-processing step where a local inpainting model, such as the TV inpainting model (5) with (6), can be run to provide a rough, but reliable initialisation of the algorithm333The code is freely available at IPOL: https://doi.org/10.5201/ipol.2012.g-tvi.
For the segmentation of the the training region within the inpainting domain we use the activecontour MATLAB function by which the Chan-Vese algorithm can be called. For this we fixed the maximum number of iterations to maxiter and use the default value as a tolerance on the relative error between iterates as a stopping criterion. We use the default values for the parameters and in (2.2.1). The subsequent labelling phase was performed by means of the standard MATLAB kmeans function after specifying a total of labels to assign. The labelling was iteratively repeated times to avoid misclassification. Once the detection of the inpainting domain is completed, in order to provide a good initialisation to the exemplar-based model we use the TV inpainting model (4) with (6) with the value and a maximum number of iterations equal to maxiter2 with a stopping criterion on the relative error between iterates depending on a default tolerance. Finally, we followed [NewsonIPOL] for the implementation of the exemplar-based inpainting model: for this we specified propagation of iterations and tested different size of the patches. In order to avoid memory shortage, we restricted to patches of the size , and .
The numerical tests were performed here on a standard MacBook Pro (Retina, 13-inch, Early 2015), 2,9 GHz Intel Core i5, 8 GB 1867 MHz DDR3 using MATLAB 2016b.
We proposed in this section a combined algorithm to retrieve image contents from two images of illuminated manuscripts shown in Figure 1 where very large regions have been damaged. At first, our algorithm computes an accurate segmentation of the inpainting domain which is performed by means of a semi-supervised method exploiting distinctive features in the image. Then, taking the segmentation result as a input, the procedure is followed by an exemplar-based inpainting strategy (upon suitable initialisation) by which the damaged regions are filled by using sensible patches.
The results reported in Figure 5 and 6 confirm the effectiveness of the combined method proposed. In particular, when looking at the difference between standard local (TV) image inpainting methods and the non-local one, we immediately appreciate the higher reconstruction quality in the damaged regions, especially in terms of texture information. The method has been validated on several image details extracted from the entire images, and has been shown effective also for very large image portions with highly damaged regions.
In term of computational times, the segmentations in Figure 4 are obtained in approximatively 15 minutes. The inpainting results in Figure 5 are obtained in about 3 minutes for patches of size 5x5 and about 7 minutes for patches of size 7x7. Overall the whole task of segmenting and inpainting the occluded regions takes approximatively 20 minutes per images of size 690x960, approximatively. However, these results highly depends on the size of the image, the size of the inpainting domain and the size of the patches chosen.
Future work could address the use of different features for the segmentation of the inpainting domain with similar methodologies, such as for instance texture features [Varma2005]. Furthermore, at an inpainting level, we observe that the reconstruction of fine details in very large damaged regions (such as the strings of the harp in Figure 6) is very challenging due to the lack of correspondence with similar training patches in the undamaged region. For solving this problem a combination of exemplar-based and structure-preserving inpainting models could be used.
In the previous section, the content of the damaged areas is completely lost and can only be estimated by assuming a content similar to the rest of the picture. But this is not the only kind of degradation encountered in illuminated manuscripts. In some cases parts of an illumination are painted over. The illuminations displayed in this section are from the primer of Claude de France. A prudish later owner could not stand Adam and Eve being nude and had the offensive areas painted over. In cases like this one, we can use infrared information to actually look through the layers and recover the apparently lost information. In the following we present the linear osmosis model, how it is generally used and we apply it to the illuminations from the primer.
The osmosis model has been introduced in [weickert2013linear] as a non-symmetric generalization of diffusion filters and a new tool for image processing problems such as seamless cloning and shadow removal. The original parabolic equation for this model is
where is the input vector field defined on the image domain with values in . The equation is typically solved on the whole image with homogeneous Neumann boundary conditions and this model has already been successfully used to solve, by dimensional splitting techniques, the light balance problem in Thermal-Quasi Reflectography (TQR) imaging, a non-destructive way to support the restoration of large mural paintings [DAFFARA2018266, parisotto2018digital].
For the work presented in the following however, we look directly for the solution of the stable state of the previous equation, the elliptic equation,
on a small sub-domain of the input image , with mixed boundary conditions. Restricting ourselves to a small domain has two main advantages: first most of the image is supposed to already be the ground truth and is therefore best left untouched and second the computation speed is much faster. Moreover having mixed boundary conditions allows for more flexibility depending on the problem at hand. On the one hand Dirichlet boundary conditions enforce the colour values on and a smooth transition when there shouldn’t be any discontinuity with the untouched part of the image. On the other hand Neumann boundary conditions prevent any diffusion across the boundary, ensuring clear colour discontinuities which is useful when the border of the mask is along an edge between two different colours.
The drift vector field encodes information from the gradient of the desired solution and serves as a guide to the diffusion. When , is an evident solution of the equation and this vector field is then called the canonical drift vector field of the image . In the following, we will note the canonical drift vector field of the image . The final method used for the primer is a combination of the two following applications.
Different results come from different manipulations of this canonical vector field. The problem of seamless cloning can be described as an improvement of a copy-and-paste process where the pasted part is modified in such a way that the pasting is indistinguishable. It typically involves two images, the background one and the one from which comes the pasted region. To seamlessly paste some area of an image on the domain of an image , the drift vector field is created by replacing by on the domain and taking the mean of both on the boundary and the equation is then solved either on the whole image domain with Neumann boundary conditions on or only on the sub-domain with Dirichlet boundary conditions.
The problem of shadow removal involves only one image and is, as its name indicates, a process that takes an image with shadowed areas and gives as a result the same view without any shadows. A shadow can be reduced as a multiplicative change in the domain of the shadowed region of the image while the canonical drift vector field is invariant to multiplicative change. The presence of the shadow is therefore only encoded in the drift vector field on the edge of the shadow. In an ideal case with a sharp constant shadow, negating the drift field on the edge of the shadow creates pure diffusion on the edge and results in a perfect shadowless image.
In the ideal case, the added pigments do not appear on the infrared image while the colours to be restored are perfectly encoded in the infrared image. In this case the problem is reduced to a simple seamless cloning application with Dirichlet boundary conditions. The drift-field of the RGB (Red Green Blue) image is replaced by the one from the infrared image on the sub-domain to be restored. But such an ideal case is uncommon. For the illuminations of the primer, several issues with the infrared appear. In most cases, the osmosis model is enough to get a nice output. For more challenging cases, we have to add some workaround.
A problem encountered in Figure 9 is that green pigments are not well encoded in the infrared image. This has the nice effect that the green leaves added are completely absent from the infrared image. Yet the original fig leaf is also green and the colour distinction between the fig leaf and the skin is slight on the infrared image. Therefore simply following the seamless cloning method doesn’t lead to satisfactory results. On the RGB image, the original fig leaf colour is present. To enforce that the skin colour and the green of the original fig leaf do not mix, we have to add Neumann boundary conditions along the edges involved. This is illustrated in Figure 9, the Neumann boundary conditions being enforced along the red lines in the mask.
Another issue encountered is that the added pigments often appear on the infrared image. If the added cloth has little to no texture discernible on the infrared and the original image appears by transparency, a mix of seamless cloning and shadow removal is applied. The added cloth is treated as a shadow. The drift-field of the RGB image is replaced by the drift-field of the infrared image on the sub-domain to be restored but the drift-field is put to zero on the edge of this sub-domain. This method is illustrated in Figure 10. The white lines of the mask are the areas where the drift-field is put to zero. If a texture from the added cloth is present in the infrared image, it appears in the final result. In Figure 10, the result is still visually acceptable.
In the case of Figure 11, the infrared image adds some useful information to the RGB image. Yet, a large amount of the added skirt texture appears on the infrared. This prevents us from using the process previously described. To get some results, we manually draw the main edges appearing in the infrared and use this hand drawn sketch as the second image for the seamless cloning process. However, this leads to a complete loss of texture in the inpainted domain. Therefore to get back texture in the restored area, we use this intermediate result as initialisation for the exemplar-based inpainting described in the previous section. This gives more natural outputs but the results are not as satisfying as the ones displayed in Figure 9 and 10.
As we just saw, such a complex restoration process necessitates significant user decisions. In fact the mask containing the sub-domain to be restored must be drawn by the user as well as the edges along which Neumann boundary conditions should be applied and the sub-domain edges where the drift-field should be put to zero.
For our experiments we used the discretisation proposed in [weickert2013linear]. Then the linear system was solved using the MATLAB UMFPACK V5.4.0 LU solver. It took 8.61 seconds to numerically obtain the final result of Figure 10 and 6.10 seconds for the final result of Figure 9. For the inpainting step of Figure 11, we used the implementation of the exemplar-based inpainting algorithm from [ipol.2015.136] 444The code is freely available at IPOL: https://doi.org/10.5201/ipol.2015.136 with the nlmedians method, patches, two scales and 3 iterations.
The numerical tests were performed on a standard MacBook Pro (Retina, 13-inch, 2017), 3,5 GHz Intel Core i7, 16 GB 2133 MHz LPDDR3 using MATLAB 2017b.
We proposed in this section a method to restore covered up parts of a manuscript using infrared information. For the illuminations from the primer that we worked on, the results are mostly satisfying, especially when the added pigments do not appear on the infrared or when the addition doesn’t have too much texture. This method necessitates some patient work for the mask creation and is heavily dependent on the infrared wavelength. Moreover it hasn’t been tested on other illuminated manuscripts.
Future work should address these difficulties and test the method on a larger dataset. For the mask creation phase, a semi-supervised segmentation step could be inserted. In this work, we have only used the visible image and a single infrared image. Some other multispectral images may help in distinguishing the added texture from the original one. But the expert would have to specify for each area which images should be used.
Having restored a damaged manuscript based on the techniques in the proceeding sections, we can now go a step further and build a 3D reconstruction. Specifically, we convert the manuscript Simon Bening, Annunciation, Fitzwilliam Museum, MS 294b, Flanders, Bruges, (1522-1523) (shown in Figure 13(a)) into stereo 3D. We do so using a 3D conversion pipeline originally developed for the conversion of Hollywood films. There, one is given the video shot from camera position and orientation (corresponding to, for example, the left eye view), and the objective is to generate a plausible reconstruction of the the video as it would appear from a perturbed position and orientation , (corresponding to the view from the other eye). In some cases and , along with other relevant camera parameters such as field of view, may be given. In other cases, they must be estimated. In our case the process is the same, except that we have a manuscript rather than a video. However, this introduces is a subtle difference. Before, although we might not know the camera parameters associated with a given video, we at least knew that they exist - but here, because the input is drawn by a human, we cannot take existence for granted. In particular, depending on the artist, the drawing may or may not obey the laws of perspective. We will show an example of this at work in the famous painting The Scream by Edvard Munch in Figure 15.
Here we briefly go over the 3D conversion pipeline used in this paper. For more details, please see [Guidefill] or [CarolaBook, Ch. 9.4]. Note that in industry, this pipeline is executed by teams of artists each responsible for different stages. Each stage is either done with a toolkit of algorithms that are then touched up by hand, or is done completely manually. The basic workflow is as follows:
Generate a rough but plausible 3D model of the scene, including a virtual camera with plausible parameters (parameters include position, orientation, field of view, possibly lense distortion, etc) placed within it. In industry, artists may be given a 3D model of the scene directly from the client, or may build it themselves. For scenes with motion, camera parameters can be estimated with the matchmove algorithm [MatchMove, MatchMove2]. The 3D models do not have to be perfect, and are typically made to a little larger than the objects they correspond to. This is because they will be “clipped” in the next step.
Generate accurate masks for all objects in the scene. This is typically done by hand, but could also be done with the help of segmentation algorithms that are then touched up. These masks are then used, much like a cookie cutter, to “clip” the 3D geometry (which, remember, was generated to be a little bit too large), throwing away the part of the geometry that is unneeded (an illustration is provided in Figure 12).
The camera is transformed into a projector, projecting the input image (or video) onto the 3D geometry.
One or more new virtual cameras are added to the scene. If the original camera is taken to be either the right or left eye, then one additional virtual camera corresponded to the other eye is needed. However, sometimes the original camera position is taken to be half way between the two eyes, so that two virtual cameras (corresponding to the left and right eyes) are needed. These camera(s) will be used to render the 3D scene from one or more new viewpoints, in order to create a stereo pair.
Because the new camera(s) will typically see bits of background previously hidden behind foreground objects in the original view, inpainting of occluded areas is required. This is typically done using a toolbox of inpainting algorithms that are then touched up by hand. In our example, inpainting was done in Photoshop, using a combination of Content Aware fill and manual copy pasting of patches by hand.
Steps one and two can be thought of as generating a depth map for the image. The rough geometry generated in step one provides the smooth component of the depth map, while the masks generated in step two generate the depth discontinuities. Because the human eye is most sensitive to depth discontinuities, these have to be very accurate, but the 3D models do not. For example, in the conversion of Figure 13(a), the virgin Mary is modelled using just a few simple geometric primitives including an ellipsoid for her body, a sphere for her head, a cylindrical halo and a cone for the bottom of her dress. This is illustrated in 12, where we also show the geometry for the angel Gabriel.
The results of our 3D conversion are presented in Figure 13, where we show the original manuscript (assumed to be the right eye view) side by side with the reconstructed left eye view. A limitation of the pipeline we have used is that it does not handle partially transparent objects properly. In this case, bits of background in the original right eye view are visible through the halos of both the virgin Mary and the angel Gabriel. In particular, in the original right eye view, a bit of Mary’s bed is visible through her halo. When rendered from the new left eye vantage point, we should now be seeing the window through her halo, but instead we continue to see the bed. See Figure 14 for a closeup of this defect. To overcome this, one could modify the pipeline in Section 4.1 to first decompose semi-transparent objects into two images (in this case, the pure halo and the background). This is something we would like to investigate in the future.
An adequate mathematical analysis and processing of images arising in the arts and humanities needs to meet special requirements:
There is often particular domain expertise which any analysis should ideally make use of. For instance, when digitally restoring an image, the integration of related images such as paintings from the same artist, could be taken into account. In what we have discussed this concept is used to the extent that a dictionary of characteristic structures in the undamaged part of the illuminations was created and used to fill in the lost contents in the damaged regions, compare Figures 5, 6. This could be driven much further, expanding the dictionary by illuminations or details of illuminations from the same artist.
The results achieved in Figures 9, 10, 11 show a possible use-case for scientific imaging in art restoration or art interpretation. Indeed, we believe that the integration of different types of scientific imaging such as infrared imaging, are likely to give benefit to image analysis methods and so the latter should be able to capture those.
Explainability of results is crucial. There is clearly a balancing act to be made between hand-crafted analysis that captures expert knowledge and a black-box, data-driven image analysis approach. In particular, the latter should ideally have an interpretable mathematical representation that gives rise to new conclusions. In this paper we have solely considered model-based and hence explainable solutions to art restoration and interpretation problems. The growing emergence of deep learning solutions to various image analysis tasks provides an alternative approach to these problems, at the moment however without a proper explanation.
Relevant characteristics are often hidden in very fine details of the artwork, like a brushstroke in a painting. Capturing these fine details in a digital format results in high-resolution images that an image analysis method should be capable of processing. This means there is a demand for computationally-efficient image analysis methods.
With the above in mind, we have discussed a selected subset of mathematical approaches and their possible use-cases in the restoration and interpretation of illuminated manuscripts. These approaches are not perfect yet by all means and there is plenty of room for improvement, compare our discussion in Sections 2.4, 3.4 and 4.2.
Luca Calatroni acknowledges the support of Fondation Mathémathique Jacques Hadamard (FMJH). Simone Parisotto acknowledges the UK Engineering and Physical Sciences Research Council (EPSRC) grant EP/L016516/1 for the University of Cambridge Centre for Doctoral Training, the Cambridge Centre for Analysis. Carola-Bibiane Schöenlieb acknowledges support from the Engineering and Physical Sciences Research Council (EPSRC) ’EP/K009745/1’, the Leverhulme Trust project ’Breaking the non-convexity barrier’, the EPSRC grant ’EP/M00483X/1’, the EPSRC centre ’EP/N014588/1’, the Alan Turing Institute ’TU/B/000071’, CHiPS (Horizon 2020 RISE project grant), the Isaac Newton Institute, and the Cantab Capital Institute for the Mathematics of Information
The authors declare that they have no competing interests.
All authors contributed to the manuscript in equal parts.