Innovative Non-parametric Texture Synthesis via Patch Permutations

01/14/2018 ∙ by Ryan Webster, et al. ∙ 0

In this work, we present a non-parametric texture synthesis algorithm capable of producing plausible images without copying large tiles of the exemplar. We focus on a simple synthesis algorithm, where we explore two patch match heuristics; the well known Bidirectional Similarity (BS) measure and a heuristic that finds near permutations using the solution of an entropy regularized optimal transport (OT) problem. Innovative synthesis is achieved with a small patch size, where global plausibility relies on the qualities of the match. For OT, less entropic regularization also meant near permutations and more plausible images. We examine the tile maps of the synthesized images, showing that they are indeed novel superpositions of the input and contain few or no verbatim copies. Synthesis results are compared to a statistical method, namely a random convolutional network. We conclude by remarking simple algorithms using only the input image can synthesize textures decently well and call for more modest approaches in future algorithm design.



There are no comments yet.


page 6

page 8

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Non-parametric texture synthesis algorithms, initiated by [7], operate by copying patches from the exemplar to synthesis. Since then, these methods have improved considerably in terms of their realism, synthesizability domain and spacetime efficiency. Nearly all modern formulations follow the optimization based method of [14], where patches are iteratively matched and re-averaged in the synthesis. Such a formulation allows local regions to propagate to global plausibility. Since then, methods have sought to improve synthesizability domain by enforcing various constraints on the match. Notably, the Bidirectional Similarity method [18] promotes matches using all patches from the input. For example, it will not converge on implausible low energy configurations, such as a constant image using a single patch, whereas the nearest neighbor (NN) match in [14] does in practice. Very recent methods, such as [15] find optimal permutations of patches using the hungarian algorithm. On a similar note, [8]

formulates the problem of color transfer as a regularized discrete optimal transport problem, whose solution is computed using linear programming. Also in a similar vein,

[19] forces patch usage statistics in a learned dictionary. In this work, we take a slightly different approach to [15], instead obtaining an approximate solution using Sinkhorn’s algorithm. This turns out to be critical for synthesizing high resolution images, where the cost matrix does not fit into memory and all computation has to be sliced.

1.1 Contributions

We present two contributions in this work. Our first contribution is the demonstration that entropic optimal transport can be used for texture synthesis in Section 2. We do not utilize the transport plan directly, instead using it to estimate a near permutation between input and synthesis patches. Low values of the entropic regularizer yield near permutations in practice. Our second contribution is a demonstration that non-parametric algorithms can produce novel images by using small patch sizes in Section 3. Synthesis results are present in Figure

3, alongside the BS matching heuristic and synthesis with a random convolutional network [20].

2 Match Heuristics

The method of [14] aims to minimize the following objective


Here is the exemplar image as a tensor, is the current synthesis image, is the patchifying linear operator and is a binary matrix such that , i.e. every patch in has a match in . reshapes all patches in periodic extensions of and into tensors and , whose rows contain patches. In [14] they use to promote image sharpness however in this work we use as it allows a fast parallel computation of distances using matrix multiplication. [14] solves (1) by alternating between a solution for , which can be obtained by solving a sparse system and a solution for obtained with a nearest neighbor search. In practice, this optimization method can fail because there are no constraints upon , where for some images the synthesis will reuse a very small set of patches in across the synthesis. [18] mitigates this, by minimizing an objective similar to the following


Here, is a parameter that balances and ’s nearest neighbor match choice. Both (1) and (2) are minimized in a multi resolution framework, as are the majority of texture synthesis algorithms, non-parametric or otherwise. Algorithm 1 represents a vanilla multi-resolution texture synthesis algorithm.

1:Exemplar image , Number of scales , Number of synthesis iterations, Match heuristic
2:Synthesized texture

Noise sampled from uniform distribution

4:for  do
5:      at resolution
6:      at resolution
7:     for Number of synthesis iterations do
10:          Result of match heuristic on and
12:          Re-average back into the image domain
13:     end for
15:end for
Algorithm 1 Multi-Resolution Texture Synthesis

For BS (2), the match heuristic is performed slightly differently than line 10, where instead two separate NN searches are performed and is updated as a convex combination of the two corresponding updates, according to . In practice, the patchifying operator takes a random subset of patches which in this work was for every experiment. Additionally, when using small patch sizes in tandem with sub sampling, high resolutions can drift from the previous low resolution synthesis. To fix this, we re - average the synthesis with lower resolutions at every iteration.

As we suggested before, (2) alleviates the short comings of (1) because it promotes a uniform usage of patches from the exemplar. We define this notion here empirically as the Match Cardinality


where is the number of columns in with at least one nonzero element. In the next section, we’ll discuss methods that can achieve permutations or high match cardinalities, using an optimal transport approach.

2.1 Optimal Transport Formulation

In recent years, optimal transport (OT) has seen a rich set of applications in computer graphics and machine learning. Recent works

[15] have applied OT to non-parametric texture synthesis. The original optimal transport problem seeks to minimize


where is the set of doubly stochastic matrices

and is a distance matrix. In relationship to (1), represents the euclidean pairwise distances between patches, i.e. . (4) is solved exactly with the Hungarian algorithm. [15] employs this approach to find a global minimum of (1), with a norm instead of a squared euclidean norm and a permutation instead of the unconstrained binary matrix . Unfortunately, the Hungarian algorithm runs in time, which severely limits the resolution of images before the problem becomes intractable. A myriad of approximations which solve (4) with a permutation (or assignment), for example the auction algorithm [3]. However, for the image resolutions we present in section 5, does not fit into memory and has to be computed with a sliced matrix multiplication. In addition, for problems of this size, we’d hope our matching algorithm is parallelized and can run on the gpu. In [5], they discuss parallelization schemes for maximum cardinality matching bipartite graphs. They note that in the case where doesn’t fit into memory, these parallelization schemes may be inefficient. For this reason, we turn to a regularized version of (4), whose solution can be computed on the gpu and with linear memory.

2.2 Entropy Regularized Transport

Much of the recent attention to OT has been driven by useful approximations to (4), most notably the entropic regularization schemes initiated by [4]. [4] proposed modifying (4) to penalize that have low entropy


where is the entropy of of matrix . Not only does this convexify (4), an optimal solution can be computed extremely efficiently with the Sinkhorn-Knopp (SK) matrix scaling algorithm. What is especially attractive about the algorithm in our scenario, is it can be adapted trivially to a low memory setting, where the matrix will not fit into memory and only requires

memory for the scaling vectors. We present this simple modification in Algorithm


Row stacked patch matrices and
Memory parameter , entropic regularizer
Scaling vectors ,
Concatenate ones and norms of
for Number of scaling iterations do
     for Number of slices according to  do
          row indices for current slice
     end for
     for Number of slices according to  do
          row indices for current slice
     end for
end for
Algorithm 2 Low Memory Sinkhorn

As is shown in [4], the optimal solution of (5) is necessarily of the form , where are the output of sufficiently many iterations of the SK algorithm. In our setting, we typically run only a few iterations of Algorithm 2

, implicitly obtaining an approximately doubly stochastic matrix

. The nature of this doubly stochastic matrix , also called the transport plan, is highly related to the entropic regularizer , where high values of represent more uncertainty in the solution, spreading the values of across rows and columns. As , resembles more of a hard assignment. Assignment promotes a fast convergence of Algorithm 1 and even for small values of , using directly results in a blurry synthesis as each patch is updated as the convex combination on many input patches. We could define a match in the following way


This already works well because it returns a much higher match cardinality than taking minimums on (i.e. nearest neighbors) and prevents the method of [14] from outright failing at small patch sizes. A very similar approach was employed in [6]

to bipartite graphs, where the adjacency matrix was normalized using SK, and then the columns, which comprise probability distributions, were sampled to provide a match cardinality of

in expected value. [6] proceeds by using a Karp-Siper heuristic to resolve columns that were not matched uniquely. We take a simpler and more vectorized approach. We resolve columns that were not matched uniquely, i.e. where , by taking an argument maximum along the rows of at the nonzero locations of , obtaining a permutation on the support of . Iterating this process yields Algorithm 3. Algorithm 3 is implemented with the same memory slicing as Algorithm 2 but is presented in this form for brevity.

an approximately doubly stochastic matrix
A high cardinality match
sets of unmatched rows and columns
for  = Number of iterations or until desired  do
      Arg max of along columns as a binary matrix of indices
     if  Number of iterations then
          Arg max along rows of , which is a permutation
     end if
      Rows and columns indexing the support of
      Remove matched indices from ,
end for
Algorithm 3 Greedy High Cardinality Match

In practice, small values of reach permutations extremely quickly for most texture inputs, especially as the synthesis image approaches a permutation of the exemplar. Because matched rows and columns are removed from at each iteration, high match cardinality at each iterate greatly accelerates the algorithm. The effect of on image synthesis can be seen in Figure 1, where lower values of have more accurate image structure and color histograms.

Algorithm 2 is implemented trivially on the gpu, as is the original SK algorithm, where it enjoys a significant speedup [4]. In fact, Algorithm 2 and 3 are simple enough to have a fast implementation in native MATLAB, using only vectorized tensor and matrix products, calling gpuArray of inputs to run the algorithms on the gpu. Native MATLAB code for every algorithm and experiment in this document is provided on page 1.

Figure 1: Entropic regularization. First image is the exemplar and subsequent images are synthesized with ,, respectively. When , Algorithm 3 returns a near permutation. This helps impose global image structure when synthesizing with a small patch size of 4.

3 Innovative Synthesis

Texture synthesis is an ambiguously defined problem and ultimately depends on human observation. Even today, with the explosion of generative methods initiated by Generative Adversarial Networks [11], the success of generative methods is still largely determined by human inspection. Nevertheless, we supplement our inspection with various methods, such as the Inception Score [17], to provide non-visual statistical cues or confidence over large datasets for generated images.

For exemplar based texture synthesis, the ultimate goal is to be as perceptually close to the input without egregiously copying it. For example, while Self Tuning Texture Optimization [13] can achieve a high resolution synthesis extremely quickly, its match heuristic is limited to tilings and thus it always contains salient copies of the input, see Figure 2. We propose to measure what percentage of the image was tiled as the Innovation Capacity. Of course, euclidean distance in RGB space is sensitive to noise and diffeomorphism. To make this notion slightly more robust, we average the Innovation Capacity over every synthesis resolution in Algorithm 4.

1:Exemplar image , Synthesis , Number of scales
2:Innovation Capacity
3:for  do
4:      at resolution
5:      at resolution
6:      Nearest neighbor match between with lines of Algorithm 1
7:      Identity tile map, i.e. reshaped to
8:      patches of , with center index removed.
10:      I.e. percentage of tiled pixels
11:end for
12: Mean over
Algorithm 4 Multi-Resolution Innovation Capacity


Figure 2: Innovation capacity. The second and third column were synthesized with OT and patch size 4 and 7 respectively, the fourth image with [13] and a patch size of 4. Listed are the multi resolution innovation capacities with resolutions. The third image converged to a tiling at low resolution due to its large patch size. [13] can synthesize with small patch size but its search is intrinsically limited to tilings via PatchMatch [1] . The second column, however, does not contain any salient copies which is corroborated by its high innovation capacity and tile map.

3.1 Comparison with Statistical Synthesis

To give perspective of how innovative our algorithm is we compare the innovation capacity to a statistical algorithm. Modern statistical algorithms, such as [9], are breathtaking in their ability to synthesize textures without copying them. [9], however, implicitly uses millions of labeled images, as it uses the vgg-19 pre-trained network and optimizes over an enormous set of parameters. This complexity makes the algorithm somewhat uninsightful, other than gram matrices of filters are well suited for synthesis, which was known over a decade earlier [16].

As the flavor of this work is simplicity, we disregard [9] and instead compare our algorithm to synthesis with random filters [20]

. This turns out to be a more appropriate comparison, as both random filter methods and our method use only the exemplar. The algorithm can be viewed as a single layer network, with a single convolution followed by a rectified linear unit. The synthesis is then optimized to minimize the distance between its gram matrix of random features and that of the exemplar. Algorithm 5 presents the objective function.

1:Current optimizer state, including current and
2:Gram matrix loss and derivative
3: Filter bank convolution of with Gaussian noise
4: I.e. the rectified linear unit.
5: Reshape to two dimensions, with the number of filters in dimension two
6: Gram matrix of features
7: Gram matrix of computed with lines
9: Compute with auto differentiation using back propogation
Algorithm 5 Gram Loss Objective

We use the same multi resolution framework as 1, starting a new optimization at increasing resolutions. Finally, we use L-BFGS as it substantially improves the convergence times, which still takes 500 or so iterations. Algorithm 5 also has serious memory issues. This is because when too few random features are used, the optimizer quickly converges on a minimum. Synthesis is only successful with more filters. The sweet spot is around filters, which is still limits the resolution at which you can perform convolution in one shot.

3.2 Discussion

Figure 3 compares synthesis results of the non-parametric OT (5), BS (2) methods and the statistical random convolution method in Algorithm 5. All methods are fairly plausible and contain few salient copies, which is in accordance with their high innovation capacities. We invite the reader to zoom in on the images and verify this by inspection. There are some copies present, which tend to be the non-textural pieces of the input image. For example, if the input color distribution contains a peculiarity, such as a small blotch of red, then Algorithm 3 will reach a consistent strong match. The greedy search in BS (2) was slightly more likely to verbatim copy, for example in the oil image where it obtained an innovation capacity of as opposed to and achieved by OT and random convolution respectively. Finally, Algorithm 5 seems to represent the color distributions slightly better than the non-parametric methods while being noisy and representing geometric structures slightly worse. OT’s advantage over BS was in the color distributions, which is not surprising as its matches are nearly permutations, while BS had a match cardinality of in practice. Interestingly, when the statistical method of Algorithm 5 starts at too low of resolution or uses too many filters, the gram matrix becomes uniquely defined and Algorithm 5 converges on a circular shift of the image. At higher resolutions, it returns a high innovation capacity because it is slightly distorted, while the egregious copies are still noticeable by inspection. This helps justify the multi resolution Innovation Capacity, as it would sometimes be extremely low for this method under certain parameters. In fact, [9] can be optimized with far less parameters (stopping at relu3_1), when optimized over multiple resolutions while potentially suffering this same pitfall. Additionally, it will also be prone to noticeably copying geometric and color peculiarities in the input, even under the its original formulation.

3.3 Future Work

Synthesis with the VGG-19 convolutional network [9] is undeniably the state of the art for exemplar based texture synthesis, up to small improvements based upon that method. However, this work shows that texture synthesis algorithms that use only the input image can synthesize images fairly well and training a massive complex network on millions of labeled images is likely overkill. This is in accordance with a number of recent methods, including the Spatial GAN [12], which learns a generative image representation from texture patches of the input image. Spatial GANs build the textures ”from scratch”, as they begin generation from a noise vector and because of this require a huge amount of parameters, typically more than the number of pixels in the image. Of course, a non-parametric algorithm needs the entire exemplar as well but images are highly compressible while convolution filters are not.

The biggest pitfall to the non-parametric algorithms in this document is that euclidean distance in RGB space is unstable to diffeomorphism, especially with a larger patch size. That is, image regions should be expected to be slightly deformed to corresponding regions in the synthesis, as they are in [9]. Convolutional networks, such as wavelet scattering networks [2] or the vgg-19 network, create image representations that are stable to diffeomorphism, which aids their ability to recognize the same objects under slightly different appearances. We think the simplicity of non-parametric methods is still valuable, especially when equipped with an elegant distance metric such as OT. A hybrid method may capture the best of both worlds, using a shallow learned or fixed patch representation that are stable to diffeomorphism. The challenge is fully integrating the patch representation into the texture optimization, where patch representations will need to be inverted. Perhaps one could use the recent method of SinkhornAutoDiff [10] to provide a differentiable entropic OT loss between synthesis and exemplar representations, so that the optimization could be accomplished with SGD. This way, one has the benefit of not needing to build the image from scratch via OT and a more meaningful distance metric through the patch representation.


OT (5)

BS (2)

Gram Loss 5


Figure 3: Synthesis Results. First column are the exemplars, second and third are computed with OT , BS respectively with a patch size of 4 and fourth column is synthesis with a random filter gram loss 5 with 256 filters. Under each column is the average multi resolution innovation capacity, computed with resolutions.

4 Conclusion

In this work, we demonstrated that non-parametric algorithms can produce novel images. Innovative synthesis is achieved with a small patch size, where global plausibility depends on the qualities of the match. A match heuristic using entropic optimal transport was well suited for memory intensive applications such as texture synthesis. Less entropic regularization corresponded with more plausible images. Finally, we defined a metric to help determine how novel a synthesized image is. The OT and BS methods were capable of plausible synthesis with high innovation capacity, corroborated by a visual inspection revealing few or no egregious copies.


  • Barnes et al. [2009] Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. Patchmatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph., 28(3):24–1, 2009.
  • Bruna and Mallat [2013] Joan Bruna and Stéphane Mallat. Invariant scattering convolution networks. IEEE transactions on pattern analysis and machine intelligence, 35(8):1872–1886, 2013.
  • Burkard and Cela [1999] Rainer E Burkard and Eranda Cela. Linear assignment problems and extensions. In

    Handbook of combinatorial optimization

    , pages 75–149. Springer, 1999.
  • Cuturi [2013] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in neural information processing systems, pages 2292–2300, 2013.
  • Deveci et al. [2013] Mehmet Deveci, Kamer Kaya, Bora Uçar, and Ümit V Catalyürek. Gpu accelerated maximum cardinality matching algorithms for bipartite graphs. In European Conference on Parallel Processing, pages 850–861. Springer, 2013.
  • Dufossé et al. [2015] Fanny Dufossé, Kamer Kaya, and Bora Uçar. Two approximation algorithms for bipartite matching on multicore architectures. Journal of Parallel and Distributed Computing, 85:62–78, 2015.
  • Efros and Leung [1999] Alexei A Efros and Thomas K Leung. Texture synthesis by non-parametric sampling. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, volume 2, pages 1033–1038. IEEE, 1999.
  • Ferradans et al. [2013] Sira Ferradans, Nicolas Papadakis, Julien Rabin, Gabriel Peyré, and Jean-François Aujol. Regularized discrete optimal transport. In International Conference on Scale Space and Variational Methods in Computer Vision, pages 428–439. Springer, 2013.
  • Gatys et al. [2015] Leon Gatys, Alexander S Ecker, and Matthias Bethge.

    Texture synthesis using convolutional neural networks.

    In Advances in Neural Information Processing Systems, pages 262–270, 2015.
  • Geneway et al. [2017] Aude Geneway, Gabriel Peyré, Marco Cuturi, et al. Learning generative models with sinkhorn divergences. Technical report, 2017.
  • Goodfellow et al. [2014] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
  • Jetchev et al. [2016] Nikolay Jetchev, Urs Bergmann, and Roland Vollgraf. Texture synthesis with spatial generative adversarial networks. arXiv preprint arXiv:1611.08207, 2016.
  • Kaspar et al. [2015] Alexandre Kaspar, Boris Neubert, Dani Lischinski, Mark Pauly, and Johannes Kopf. Self tuning texture optimization. In Computer Graphics Forum, volume 34, pages 349–359. Wiley Online Library, 2015.
  • Kwatra et al. [2005] Vivek Kwatra, Irfan Essa, Aaron Bobick, and Nipun Kwatra. Texture optimization for example-based synthesis. ACM Transactions on Graphics (ToG), 24(3):795–802, 2005.
  • Ortega et al. [2017] Jorge Alberto Gutierrez Ortega, Julien Rabin, Bruno Galerne, and Thomas Hurtut. Optimal patch assignment for statistically constrained texture synthesis. 2017.
  • Portilla and Simoncelli [2000] Javier Portilla and Eero P Simoncelli. A parametric texture model based on joint statistics of complex wavelet coefficients. International journal of computer vision, 40(1):49–70, 2000.
  • Salimans et al. [2016] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2234–2242, 2016.
  • Simakov et al. [2008] Denis Simakov, Yaron Caspi, Eli Shechtman, and Michal Irani. Summarizing visual data using bidirectional similarity. In

    Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on

    , pages 1–8. IEEE, 2008.
  • Tartavel et al. [2015] Guillaume Tartavel, Yann Gousseau, and Gabriel Peyré. Variational texture synthesis with sparsity and spectrum constraints. Journal of Mathematical Imaging and Vision, 52(1):124–144, 2015.
  • Ustyuzhaninov et al. [2016] Ivan Ustyuzhaninov, Wieland Brendel, Leon A Gatys, and Matthias Bethge. Texture synthesis using shallow convolutional networks with random filters. arXiv preprint arXiv:1606.00021, 2016.