TGHop: An Explainable, Efficient and Lightweight Method for Texture Generation

07/08/2021
by   Xuejing Lei, et al.
University of Southern California
13

An explainable, efficient and lightweight method for texture generation, called TGHop (an acronym of Texture Generation PixelHop), is proposed in this work. Although synthesis of visually pleasant texture can be achieved by deep neural networks, the associated models are large in size, difficult to explain in theory, and computationally expensive in training. In contrast, TGHop is small in its model size, mathematically transparent, efficient in training and inference, and able to generate high quality texture. Given an exemplary texture, TGHop first crops many sample patches out of it to form a collection of sample patches called the source. Then, it analyzes pixel statistics of samples from the source and obtains a sequence of fine-to-coarse subspaces for these patches by using the PixelHop++ framework. To generate texture patches with TGHop, we begin with the coarsest subspace, which is called the core, and attempt to generate samples in each subspace by following the distribution of real samples. Finally, texture patches are stitched to form texture images of a large size. It is demonstrated by experimental results that TGHop can generate texture images of superior quality with a small model size and at a fast speed.

READ FULL TEXT VIEW PDF

Authors

page 6

page 8

page 10

page 11

page 12

page 13

page 14

09/02/2020

NITES: A Non-Parametric Interpretable Texture Synthesis Method

A non-parametric interpretable texture synthesis method, called the NITE...
04/15/2016

Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks

This paper proposes Markovian Generative Adversarial Networks (MGANs), a...
03/19/2020

Unique Geometry and Texture from Corresponding Image Patches

We present a sufficient condition for the recovery of a unique texture p...
05/28/2015

A Generative Model of Natural Texture Surrogates

Natural images can be viewed as patchworks of different textures, where ...
11/13/2020

Fast and Scalable Earth Texture Synthesis using Spatially Assembled Generative Adversarial Neural Networks

The earth texture with complex morphological geometry and compositions s...
11/02/2020

Efficient texture mapping via a non-iterative global texture alignment

Texture reconstruction techniques generally suffer from the errors in ke...
11/26/2021

μNCA: Texture Generation with Ultra-Compact Neural Cellular Automata

We study the problem of example-based procedural texture synthesis using...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Automatic generation of visually pleasant texture that resembles exemplary texture has been studied for several decades since it is of theoretical interest in texture analysis and modeling. Research in texture generation benefits texture analysis and modeling research [1, 2, 3, 4, 5, 6, 7]

by providing a perspective to understand the regularity and randomness of textures. Texture generation finds broad applications in computer graphics and computer vision, including visual special effects generation, digital image restoration, texture and image compression, etc. It is also closely related to image enhancement tasks including image de-noising and image super-resolution.

Early works of texture generation generates textures in pixel space. Based on exemplary input, texture can be generated pixel-by-pixel [8, 9, 10] or patch-by-patch [11, 12, 13, 14, 15], starting from a small unit and gradually growing to a larger image. These methods, however, suffer from slow generation time [9, 12] or limited diversity of generated textures [11, 13, 16]. Later works transform texture images to a feature space with kernels and exploit the statistical correlation of features for texture generation. Commonly used kernels include the Gabor filters [17] and the steerable pyramid filter banks [18]

. This idea is still being actively studied with the the resurgence of neural networks. Various deep learning (DL) models, including Convolutional Neural Networks(CNNs) and Generative Adversarial Networks (GANs), yields visually pleasing results in texture generation. Compared to traditional methods, DL-based methods 

[19, 20, 21, 22, 23, 24, 25] learn weights and biases through end-to-end optimization. Nevertheless, these models are usually large in model size, difficult to explain in theory, and computationally expensive in training. It is desired to develop a new generation method that is small in model size, mathematically transparent, efficient in training and inference, and able to offer high quality textures at the same time. Along this line, we propose the TGHop (Texture Generation PixelHop) method in this work.

TGHop consists of four steps. First, given an exemplary texture, TGHop crops numerous sample patches out of it to form a collection of sample patches called the source. Second, it analyzes pixel statistics of samples from the source and obtains a sequence of fine-to-coarse subspaces for these patches by using the PixelHop++ framework [26]. Third, to generate realistic texture patches, it begins with generating samples in the coarsest subspace, which is called the core, by matching the distribution of real and generated samples, and attempts to generate spatial pixels given spectral coefficients from coarse to fine subspaces. Last, texture patches are stitched to form texture images of a larger size. Extensive experiments are conducted to show that TGHop can generate texture images of superior quality with a small model size, at a fast speed, and in an explainable way.

It is worthwhile to point out that this work is an extended version of our previous work in [27]

, where a method called NITES was presented. Two works share the same core idea, but this work provides a more systematic study on texture synthesis task. In particular, a spatial Principal Component Analysis (PCA) transform is included in TGHop. This addition improves the quality of generated textures and reduces the model size of TGHop as compared with NITES. Furthermore, more experimental results are given to support our claim on efficiency (i.e., a faster computational speed) and lightweight (i.e., a smaller model size).

The rest of the paper is organized as follows. Related work is reviewed in Sec. 2. A high-level idea of successive subspace analysis and generation is described in Sec. 3. The TGHop method is detailed in Sec. 4. Experimental results are shown in Sec. 5. Finally, concluding remarks and future research directions are given in Sec. 6.

2 Related Work

2.1 Early Work on Texture Generation

Texture generation (or synthesis) has been a long-standing problem of great interest. The methods for it can be categorized into two types. The first type generates one pixel or one patch at a time and grows synthesized texture from small to large regions. Pixel-based method synthesizes a center pixel conditioned on its neighboring pixels. Efros and Leung [9] proposed to synthesize a pixel by randomly choosing from the pixels that have similar neighborhood as the query pixel. Patch-based methods [11, 12, 13, 14, 15] usually achieves higher quality than pixel-based methods [8, 9, 10]. They suffer from two problems. First, searching the whole space to find a matched patch is slow [9, 12]. Second, the methods [11, 13, 16] that stitching small patches to form a larger image sustain limited diversity of generated patches, though they are capable of producing high quality textures at a fast speed. A certain pattern may repeat several times in these generated textures without sufficient variations due to lack of understanding the perceptual properties of texture images. The second type addresses this problem by analyzing textures in feature spaces rather than pixel space. A texture image is first transformed to a feature space with kernels. Then, statistics in the feature space, such as histograms [17] and handcrafted summary [18], is analyzed and exploited for texture generation. For the transform, some pre-defined filters such as Gabor filters [17] or steerable pyramid filter banks [18] were adopted in early days. The design of these filters, however, heavily relies on human expertise and lack adaptivity. With the recent advances of deep neural networks, filters from a pre-trained networks such as VGG provide a powerful transformation for analyzing texture images and their statistics [19, 23].

2.2 Deep-Learning-based (DL-based) Texture Generation

DL-based methods often employ a texture loss function that computes the statistics of the features. Fixing the weights of a pre-trained network, the method in

[19]

applies the Gram matrix as the statistical measurement and iteratively optimizes an initial white-noise input image through back-propagation. The method in 

[23]

computes feature covariances of white-noise image and texture image, and matches them through whitening and coloring. Both of these two methods utilized a VGG-19 network pre-trained on Imagenet dataset to extract features. The method in 

[24]

abandons the deep VGG network but adopt only one convolutional layer with random filter weights. Although these methods can generate visually pleasant textures, the iterative optimization process (i.e. backpropagation) is computationally expensive. There is a lot of follow-ups to

[19] such as incorporating other optimization terms [20, 21] and improving inference speed [22, 25]. However, there is a price to pay. The former aggravates the computational burden while the latter increases the training time. Another problem of these methods lies in the difficulty of explaining the usage of a pre-trained network. The methods in  [19, 23]

develop upon a VGG-19 network pre-trained on the Imagenet dataset. The Imagenet dataset is designed for understanding the semantic meaning of a large number of natural images. Textures, however, mainly contains low-level image characteristics. Although shallow layers (such as conv_1) of VGG are known to capture low-level characteristics of images, generating texture only with shallow layers does not give a good results in 

[19]. It is hard to justify whether the VGG feature contains redundancy for textures or ignores some texture-specific information. Lack of explainability also raises the challenge of inspecting the methods when unexpected generation results occurred. Thus, these drawbacks motivate us to design a method that is efficient, lightweight and dedicated to texture.

2.3 Successive Subspace Learning (SSL)

To reduce the computational burden in training and inference of DL-based methods, we adopt spatial-spectral representations for texture images based on the successive subspace learning (SSL) framework  [28, 29, 30]. To implement SSL, PixelHop [31] and PixelHop++ [26] architectures have been developed. PixelHop consists of multi-stage Saab transforms in cascade. PixelHop++ is an improved version of PixelHop by replacing the Saab transform with the channel-wise (c/w) Saab transform, exploiting weak correlations among spectral channels. Both the Saab transform and the c/w Saab transform are data-driven transforms, which are variants of the PCA transform. PixelHop++ offers powerful hierarchical representations and plays a key role of dimension reduction in TGHop. SSL-based solutions have been proposed to tackle quite a few problems, including [32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43]. In this work, we present an SSL-based texture image generation method. The main idea of the method is demonstrated in the next section.

3 Successive Subspace Analysis and Generation

Figure 1: Illustration of successive subspace analysis and generation, where a sequence of subspace is constructed from source space, , through a successive process indicated by blue arrows while red arrows indicate the successive subspace generation process.

In this section, we explain the main idea behind the TGHop method, successive subspace analysis and generation, as illustrated in Fig. 1. Consider an input signal space denoted by , and a sequence of subspaces denoted by . Their dimensions are denoted by , , , . They are related with each other by the constraint that any element in is formed by an affine combination of elements in , where .

An affine transform can be converted to a linear transform by augmenting vector

in via . We use to denote the augmented space of and . Then, we have the following relationship

(1)

and

(2)

We use texture analysis and generation as an example to explain this pipeline. To generate homogeneous texture, we collect a number of texture patches cropped out of exemplary texture as the input set. Suppose that each texture patch has three RGB color channels, and a spatial resolution . The input set then has a dimension of and its augmented space has a dimension of . If , we have which is too high to find an effective generation model directly.

To address this challenge, we build a sequence of subspaces , , , with decreasing dimensions. We call and the "source" space and the "core" subspace, respectively. We need to find an effective subspace from , and such an analysis model is denoted by . Proper subspace analysis is important since it determines how to decompose an input space into the direct sum of two subspaces in the forward analysis path. Although we choose one of the two for further processing and discard the other one, we need to record the relationship of the two decomposed subspaces so that they are well-separated in the reverse generation path. This forward process is called fine-to-coarse analysis.

In the reverse path, we begin with the generation of samples in by studying its own statistics. This is accomplished by generation model . The process is called core sample generation. Then, conditioned on a generated sample in , we generate a new sample in through a generation model denoted by . This process is called coarse-to-fine generation. In Fig. 1, we use blue and red arrows to indicate analysis and generation, respectively. This idea can be implemented as a non-parametric method since we can choose subspaces , , , flexibly in a feedforward manner. One specific design is elaborated in the next section.

4 TGHop Method

The TGHop method is proposed in this section. An overview of the TGHop method is given in Sec. 4.1. Next, the forward fine-to-coarse analysis based on the two-stage c/w Saab transforms is discussed in Sec. 4.2. Afterwards, sample generation in the core is elaborated in Sec. 4.3. Finally, the reverse coarse-to-fine pipeline is detailed in Sec. 4.4.

4.1 System Overview

An overview of the TGHop method is given in Fig. 2. The exemplary color texture image has a spatial resolution of and three RGB channels. We would like to generate multiple texture images that are visually similar to the exemplary one. By randomly cropping patches of size out of the source image, we obtain a collection of texture patches serving as the input to TGHop. The dimension of these patches is . Their augmented vectors form source space . The TGHop system is designed to generate texture patches of the same size that are visually similar to samples in . This is feasible if we can capture both global and local patterns of these samples. There are two paths in Fig. 2. The blue arrows go from left to right, denoting the fine-to-coarse analysis process. The red arrows go from right to left, denoting the coarse-to-fine generation process. We can generate as many texture patches as desired using this procedure. In order to generate a texture image of a larger size, we perform image quilting [11] based on synthesized patches.

Figure 2: An overview of the proposed TGHop method. A number of patches are collected from the exemplary texture image, forming source space . Subspace and are constructed through analysis model and . Input filter window sizes to Hop-1 and Hop-2 are denoted as and . Selected channel numbers of Hop-1 and Hop-2 are denoted as and . A block of size of channels in space/subspace is converted to the same spatial location of channels in subspace . Red arrows indicate the generation process beginning from core sample generation followed by coarse-to-fine generation. The model for core sample generation is denoted as and the models for coarse-to-fine generation are denoted as and .

4.2 Fine-to-Coarse Analysis

The global structure of an image (or an image patch) can be well characterized by spectral analysis, yet it is limited in capturing local detail such as boundaries between regions. Joint spatial-spectral representations offer an ideal solution to the description of both global shape and local detail information. Analysis model finds a proper subspace, , in while analysis model finds a proper subspace, , in . As shown in Fig. 2, TGHop applies two-stage transforms. They correspond to and , respectively. Specifically, we can apply the c/w Saab transform in each stage to conduct the analysis. In the following, we provide a brief review on the Saab transform [30] and the c/w Saab transform [26].

We partition each input patch into non-overlapping blocks, each of which has a spatial resolution of with

channels. We flatten 3D tensors into 1D vectors, and decompose each vector into the sum of one Direct Current (DC) and multiple Alternating Current (AC) spectral components. The DC filter is a all-ones filter weighted by a constant. AC filters are obtained by applying the principal component analysis (PCA) to DC-removed residual tensor. By setting

and , we have a tensor block of dimension . Filter responses of PCA can be positive or negative. There is a sign confusion problem [28, 29] if both of them are allowed to enter the transform in the next stage. To avoid sign confusion, a constant bias term is added to all filter responses to ensure that all responses become positive, leading to the name of the "subspace approximation with adjusted bias (Saab)" transform. The Saab transform is a data-driven transform, which is significantly different from traditional transforms (e.g. Fourier and wavelet transforms) which are data independent. We partition AC channels into two low- and high-frequency bands. The energy of high-frequency channels (shaded by gray color in Fig. 2) is low and they are discarded for dimension reduction without affecting the performance much. The energy of low-frequency channels (shaded by blue color in Fig. 2) is higher. For a tensor of dimension 12, we have one DC and 11 AC components. Typically, we select to 10 leading AC components and discard the rest. Thus, after , one 12-D tensor becomes a -D vector, which is illustrated by dots in subspace . The -D response vectors are fed into the next stage for another transform.

The channel-wise (c/w) Saab transform [26] exploits the weak correlation property between channels so that the Saab transform can be applied to each channel separately (see the middle part of Fig. 2). The c/w Saab transform offers an improved version of the standard Saab transform with a smaller model size.

One typical setting used in our experiments is shown below.

  • Dimension of the input patch (): ;

  • Dimension of subspace (): (by keeping 10 channels in Hop-1);

  • Dimension of subspace (): (by keeping 27 channels in Hop-2).

Note that the ratio between and is 83.3% while that between and is 67.5%. We are able to reduce the dimension of the source space to that of the core subspace by a factor of 56.3%. In the reverse path indicated by red arrows, we need to develop a multi-stage generation process. It should also be emphasized that users can flexibly choose channel numbers in Hop-1 and Hop-2. Thus, TGHop is a non-parametric method.

The first-stage Saab transform provides the spectral information on the nearest neighborhood, which is the first hop of the center pixel. By generalizing from one to multiple hops, we can capture the information in the short-, mid- and long-range neighborhoods. This is analogous to increasingly larger receptive fields in deeper layers of CNNs. However, filter weights in CNNs are learned from end-to-end optimization via backpropagation while weights of the Saab filters in different hops are determined by a sequence of PCAs in a feedforward unsupervised manner.

4.3 Core Sample Generation

In the generation path, we begin with sample generation in core which is denoted by . In the current design, . We first characterize the sample statistics in the core, . After two-stage c/w Saab transforms, the sample dimension in is less than 2000. Each sample contains channels of spatial dimension

. Since there exist correlations between spatial responses in each channel, PCA is adopted for further Spatial Dimension Reduction (SDR). We discard PCA components whose variances are lower than threshold

. The same threshold applies to all channels. SDR can help reduce the model size and improve the quality of generated textures. For example, we compare a generated grass texture with and without SDR in Fig. 3. The quality with SDR significantly improves.

(a) without SDR
(b) with SDR
Figure 3: Generated grass texture image with and without spatial dimension reduction (SDR).

After SDR, we flatten the PCA responses of each channel and concatenate them into a 1D vector denoted by . It is a sample in

. To simplify the distribution characterization of a high-dimensional random vector, we group training samples into clusters and transform random vectors in each cluster into a set of independent random variables. We adopt the K-Means clustering algorithm to cluster training samples into

clusters, which are denoted by ,

. Rather than modeling probability

directly, we model condition probability with a fixed cluster index. The probability, , can be written as

(3)

where is the percentage of data points in cluster . It is abbreviated as , (see the right part of Fig. 2).

Typically, a set of independent Gaussian random variables is used for image generation. To do the same, we convert a collection of correlated random vectors into a set of independent Gaussian random variables. To achieve this objective, we transform random vector in cluster

into a set of independent random variables through independent component analysis (ICA), where non-Gaussianity serves as an indicator of statistical independence. ICA finds applications in noise reduction 

[44]

, face recognition 

[45], and image infusion [46]. Our implementation is detailed below.

  1. Apply PCA to in cluster for dimension reduction and data whitening.

  2. Apply FastICA [47]

    , which is conceptually simple, computationally efficient and robust to outliers, to the PCA output.

  3. Compute the cumulative density function (CDF) of each ICA component of random vector in each cluster based on its histogram of training samples.

  4. Match the CDF in Step 3 with the CDF of a Gaussian random variable (see the right part of Fig. 2

    ), where the inverse CDF is obtained by resampling between bins with linear interpolation. To reduce the model size, we quantize N-dimensional CDFs, which have

    bins, with vector quantization (VQ) and store the codebook of quantized CDFs.

We encode in Eq. (3) using the length of a segment in . All segments are concatenated in order to build the unit interval. The segment index is the cluster index. These segments are called the interval representation as shown in Fig. 4. To draw a sample from subspace , we use the uniform random number generator to select a random number from interval . This random number indicates the cluster index on the interval representation.

Figure 4: Illustration of the interval representation, where the length of a segment in the unit interval represents the probability of a cluster, . A random number is generated in the unit interval to indicate the cluster index.

To generate a new sample in , we perform the following steps:

  1. Select a random number from the uniform random number generator to determine the cluster index.

  2. Draw a set of samples independently from the Gaussian distribution.

  3. Match histograms of the generated Gaussian samples with the inverse CDFs in the chosen cluster.

  4. Repeat Steps 1-3 if the generated sample of Step 3 has no value larger than a pre-set threshold.

  5. Perform the inverse transform of ICA and the inverse transform of PCA.

  6. Reshape the 1D vector into a 3D tensor and this tensor is the generated sample in .

The above procedure is named Independent Components Histogram Matching (ICHM). To conclude, there are two main modules in core sample generation: spatial dimension reduction and independent components histogram matching as shown in Fig. 2.

4.4 Coarse-to-Fine Generation

In this section, we examine generation model , whose role is to generate a sample in given a sample in . Analysis model, , transforms to through the c/w Saab transform in the forward path. In the reverse path, we perform the inverse c/w Saab transform on generated samples in to . We take generation model as an example to explain the generation process from and to . A generated sample in can be partitioned into groups as shown in the left part of Fig. 5. Each group of channels is composed of one DC channel and several low-frequency AC channels. The th group of channels in , whose number is denoted by , is derived from the th channel in . We apply the inverse c/w Saab transform to each group individually. The inverse c/w Saab transform converts the tensor at the same spatial location across channels (represented by white dots in Fig. 5) in into a block of size (represented by the white parallelogram in Fig. 5) in , using the DC and AC components obtained in the fine-to-coarse analysis. After the inverse c/w Saab transform, the Saab coefficients in form a generated sample in . The same procedure is repeated between and .

Figure 5: Illustration of the generation process.

Examples of several generated textures in core , intermediate subspace and source are shown in Fig. 6. The DC channels generated in the core offer gray-scale low-resolution patterns of a generated sample. More local details are added gradually from to and from to .

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 6: Examples of generated DC maps in core (first column), generated samples in subspace (second co-lumn), and the ultimate generated textures in source (third column).

5 Experiments

5.1 Experimental Setup

The following hyper parameters (see Fig. 2) are used in our experiments.

  • Input filter window size to Hop-1: ,

  • Input filter window size to Hop-2: ,

  • Selected channel numbers in Hop-1 (): ,

  • Selected channel numbers in Hop-2 (): .

The window size of the analysis filter is the same as the generation window size. All windows are non-overlapping with each other. The actual channel numbers and

are texture-dependent. That is, we examine the energy distribution based on the PCA eigenvalues and choose the knee point where the energy becomes flat.

5.2 An Example: Brick Wall Generation

We show generated brick_wall patches of size and in Figs. 7(a) and (c). We performed two-stage c/w Saab transforms on patches and three-stage c/w Saab transforms on patches, whose core subspace dimensions are 1728 and 4032, respectively. Patches in these figures were synthesized by running the TGHop method in one hundred rounds. Randomness in each round primarily comes from two factors: 1) random cluster selection, and 2) random seed vector generation.

Generated patches retain the basic shape of bricks and the diversity of brick texture. We observe some unseen patterns generated by TGHop, which are highlighted by red squared boxes in Fig. 7 (a) and (c). As compared with generated patches, generated patches were sometimes blurry (e.g., the one in the upper right corner) due to a higher source dimension.

As a non-parametric model, TGHop can choose multiple settings under the same pipeline. For example, it can select different channel numbers in

and to derive different generation results. Four settings are listed in Table 1. The corresponding generation results are shown in Fig. 8. Dimensions decrease faster from (a) to (d). The quality of generated results becomes poorer due to smaller dimensions of the core subspace, , and the intermediate subspace, .

Setting
a 3072 2560 2048
b 3072 1536 768
c 3072 1280 512
d 3072 768 192
Table 1: The settings of four generation models.

To generate larger texture images, we first generate 5,000 texture patches and perform image quilting [11] with them. The results after quilting are shown in Figs. 7 (b) and (d). All eight images are of the same size, i.e., . They are obtained using different initial patches in the image quilting process. By comparing the two sets of stitched images, the global structure of the brick wall is better preserved using larger patches (i.e. of size ) while its local detail is a little bit blurry sometimes.

(a) Synthesized Patches
(b) Stitched Images with Patches
(c) Synthesized Patches
(d) Stitched Images with Patches
Figure 7: Examples of generated texture patches and stitched images of larger sizes, where the image in the bottom-left corner is the exemplary texture image and the patches highlighted by red squared boxes are unseen patterns.
(a)
(b)
(c)
(d)
Figure 8: Generated patches using different settings, where the numbers below the figure indicates the dimensions of , and , respectively.

5.3 Performance Benchmarking with DL-based Methods

5.3.1 Visual Quality Comparison

The quality of generated texture is usually evaluated by human eyes. A diversity loss function was proposed to measure texture diversity for DL-based methods [25, 22]. Since TGHop dose not have a loss function, we show generated results of two DL-based methods and TGHop side by side in Fig. 9 for 10 input texture images collected from [19, 24, 18] or the Internet. The benchmarking DL methods were proposed by Gatys et al. [19] and Ustyuzhaninov et al. [24]. By running their codes, we show their results in the second and third columns of Fig. 9, respectively, for comparison. These results are obtained by default iteration numbers; namely, 2000 in [19] and 4000 in [19]. The results of TGHop are shown in the last three columns. The left two columns are obtained without spatial dimension reduction (SDR) in two different runs while the last column is obtained with SDR. There is little quality degradation after dimension reduction of with SDR. For meshed and cloud textures, the brown fog artifact in  [19, 24] is apparent. In contrast, it does not exist in TGHop. More generated images using TGHop are given in Fig. 10. As shown in Figs.  9 and 10, TGHop can generate high quality and visually pleasant texture images.

Figure 9: Comparison of texture images generated by two DL-based methods and TGHop (from left to right): exemplary texture images, texture images generated by [19], by [24], two examples by TGHop without spatial dimension reduction (SDR) and one example by TGHop with SDR.
Figure 10: More texture images generated by TGHop.

5.3.2 Comparison of Generation Time

We compare the generation time of different methods in Table 2. All experiments were conducted on the same machine composed of 12 CPUs (Intel Core i7-5930K CPU at 3.50GHz) and 1 GPU (GeForce GTX TITAN X). GPU was needed in two DL-based methods but not in TGHop. We set the iteration number to 1000 for [19] and 100 for [24]. TGHop generated 10K patches for texture quilting. For all three methods, we show the time needed in generating one image of size in Table 2, TGHop generates one texture image in 291.25 seconds while Gatys’ method and Ustyuzhaninov’s method demand 513.98 and 949.64 seconds, respectively. TGHop is significantly faster.

Methods Time (seconds) Factor
Ustyuzhaninov et al. [24] 949.64 4.62x
Gatys et al. [19] 513.98 2.50x
TGHop with analysis overhead 291.25 1.42x
TGHop w/o analysis overhead 205.50 1x
Table 2: Comparison of time needed to generate one texture image.

We break down the generation time of TGHop into three parts: 1) successive subspace analysis (i.e., the forward path), 2) core and successive subspace generation (i.e., the reverse path) and 3) the quilting process. The time required for each part is shown in Table 3. They demand 85.75, 197.42 and 8.08 seconds, respectively. To generate multiple images from the same exemplary texture, we run the first part only once, which will be shared by all generated texture images, and the second and third parts multiple times (i.e., one run for a new image). As a result, we can view the first part as a common overhead and count the last two parts as the time for single texture image generation. This is equal to 205.5 seconds. The two DL benchmarks do not have such a breakdown and need to go through the whole pipeline to generate one new texture image.

Processes Time (seconds)
Analysis (Forward Path) 85.75
Generation (Reverse Path) 197.42
Quilting 8.08
Table 3: The time of three processes in our method.

5.4 Comparison of Model Sizes

The model size is measured by the number of parameters. The size of TGHop is calculated below.

  • Two-stage c/w Saab Transforms
    The forward analysis path and the reverse generation path share the same two-stage c/w Saab transforms. For an input RGB patch, the input tensor of size is transformed into a -D tensor in the first-stage transform, leading a filter size of plus one shared bias. For each of channels, the input tensor of size is transformed into a -D tensor in the second stage transform. The total parameter number for all channels is plus biases. Thus, the total number of parameters in the two-stage transforms is .

  • Core Sample Generation
    Sample generation in the core contains two modules: spatial dimension reduction (SDR) and independent components histogram matching (ICHM). For the first module, SDR is implemented by PCA transforms, where the input of size and the output is a dimensional vector, yielding the size of each PCA transformation matrix to be . The total number of parameters is , where is the dimension of the concatenated output vector after SDR. For the second module, it has three components:

    1. Interval representation
      parameters are needed for each cluster.

    2. Transform matrices of FastICA
      If the input vector is of dimension and the output dimension of FastICA is for the th cluster, , the total parameter number of all transforms matrices is , where is the number of CDFs.

    3. Codebook size of quantized CDFs
      The codebook contains the index, the maximum and the minimum values for each CDF. Furthermore, we have clusters of CDF, where all CDFs in each cluster share the same bin structure of 256 bins. As a result, the total parameter number is .

    By adding all of the above together, the total parameter number in core sample generation is .

(a) 50 clusters
(b) 80 clusters
(c) 110 clusters
(d) 200 clusters
Figure 11: Generated brick_wall patches using different cluster numbers in independent component histogram matching.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 12: Generated brick_wall patches using different threshold values in SDR.
Module Equation Num. of Param.
Transform - stage 1 109
Transform - stage 2 801
Core - SDR 58,176
Core - ICHM(i) 50
Core - ICHM(ii) 2,288,862
Core - ICHM(iii) 58,754
Total 2,406,752
Table 4: The number of parameters of TGHop, under the setting of , , , , and .

The above equations are summarized and an example is given in Table 4 under the experiment setting of , , , and . The model size of TGHop is 2.4M. For comparison, the model sizes of [19] and [24] are 0.852M and 2.055M, respectively. A great majority of TGHop model parameters comes from ICHM(ii). Further model size reduction without sacrificing generated texture quality is an interesting extension of this work.

Number of Parameters
0 1408 3.72M
0.0005 1226 3.26M
0.005 1030 2.74M
0.01 909 2.41M
0.02 718 1.88M
0.03 553 1.43M
0.04 399 1.00M
0.05 289 0.69M
0.1 102 0.19M
Table 5: The reduced dimension, , and the model size as a function of threshold used in SDR.

As compared with [27], SDR is a new module introduced in this work. It helps remove correlations of spatial responses to reduce the model size. We examined the impact of using different threshold in SDR on texture generation quality and model size with brick_wall texture. The same threshold is adopted for all channels to select PCA components. The dimension of reduced space, , and the cluster number, , are both controlled by threshold , used in SDR. represents all 64 PCA components are kept in SDR. We can vary the value of to get a different cluster number and the associated model size. The larger the value of , the smaller and and, thus, the smaller the model size as shown in Table 5. The computation given in Table 4 is under the setting of .

A proper cluster number is important since too many clusters lead to larger model sizes while too few clusters result in bad generation quality. To give an example, consider the brick_wall texture image of size , where the dimension of is with . We extract 12,769 patches of size

(with stride 2) from this image. We conduct experiments with

50, 80, 110 and 200 clusters and show generated patches in Fig. 11. As shown in (a), 50 clusters were too few and we see the artifact of over-saturation in generated patches. By increasing from 50 to 80, the artifact still exists but is less apparent in (b). The quality improves furthermore when as shown in (c). We see little quality improvement when goes from 100 to 200. Furthermore, patches generated using different thresholds are shown Fig. 12. We see little quality degradation from (a) to (f) while the dimension is reduced from 1408 to 553. Image blur shows up from (g) to (i), indicating that some details were discarded along with the corresponding PCA components.

6 Conclusion and Future Work

An explainable, efficient and lightweight texture generation method, called TGHop, was proposed in this work. Texture can be effectively analyzed using the multi-stage c/w Saab transforms and expressed in form of joint spatial-spectral representations. The distribution of sample texture patches was carefully studied so that we can generate samples in the core. Based on generated core samples, we can go through the reverse path to increase its spatial dimension. Finally, patches can be stitched to form texture images of a larger size. It was demonstrated by experimental results that TGHop can generate texture images of superior quality with a small model size and at a fast speed.

Future research can be extended in several directions. Controlling the growth of dimensions of intermediate subspaces in the generation process appears to be important. Is it beneficial to introduce more intermediate subspaces between the source and the core? Can we apply the same model for the generation of other images such as human faces, digits, scenes and objects? Is it possible to generalize the framework to image inpainting? How does our generation model compare to GANs? These are all open and interesting questions for further investigation.

Acknowledgment

This research was supported by a gift grant from Mediatek. Computation for the work was supported by the University of Southern California’s Center for High Performance Computing (hpc.usc.edu).

References

  • [1] Mihran Tuceryan and Anil K Jain. Texture analysis.

    Handbook of pattern recognition and computer vision

    , pages 235–276, 1993.
  • [2] Tianhorng Chang and C-C Jay Kuo. Texture analysis and classification with tree-structured wavelet transform. IEEE Transactions on image processing, 2(4):429–441, 1993.
  • [3] S Arivazhagan and Lakshmanan Ganesan. Texture classification using wavelet transform. Pattern recognition letters, 24(9-10):1513–1521, 2003.
  • [4] Song Chun Zhu, Yingnian Wu, and David Mumford. Filters, random fields and maximum entropy (frame): Towards a unified theory for texture modeling. International Journal of Computer Vision, 27(2):107–126, 1998.
  • [5] Song Chun Zhu, Ying Nian Wu, and David Mumford. Minimax entropy principle and its application to texture modeling. Neural computation, 9(8):1627–1660, 1997.
  • [6] Kaitai Zhang, Hong-Shuo Chen, Ye Wang, Xiangyang Ji, and C-C Jay Kuo. Texture analysis via hierarchical spatial-spectral correlation (hssc). In 2019 IEEE International Conference on Image Processing (ICIP), pages 4419–4423. IEEE, 2019.
  • [7] Kaitai Zhang, Hong-Shuo Chen, Xinfeng Zhang, Ye Wang, and C-C Jay Kuo. A data-centric approach to unsupervised texture segmentation using principle representative patterns. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1912–1916. IEEE, 2019.
  • [8] Jeremy S De Bonet. Multiresolution sampling procedure for analysis and synthesis of texture images. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages 361–368, 1997.
  • [9] Alexei A Efros and Thomas K Leung. Texture synthesis by non-parametric sampling. In Proceedings of the seventh IEEE international conference on computer vision, volume 2, pages 1033–1038. IEEE, 1999.
  • [10] Li-Yi Wei and Marc Levoy. Fast texture synthesis using tree-structured vector quantization. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 479–488, 2000.
  • [11] Alexei A Efros and William T Freeman. Image quilting for texture synthesis and transfer. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 341–346, 2001.
  • [12] Lin Liang, Ce Liu, Ying-Qing Xu, Baining Guo, and Heung-Yeung Shum. Real-time texture synthesis by patch-based sampling. ACM Transactions on Graphics (ToG), 20(3):127–150, 2001.
  • [13] Michael F Cohen, Jonathan Shade, Stefan Hiller, and Oliver Deussen. Wang tiles for image and texture generation. ACM Transactions on Graphics (TOG), 22(3):287–294, 2003.
  • [14] Vivek Kwatra, Arno Schödl, Irfan Essa, Greg Turk, and Aaron Bobick. Graphcut textures: image and video synthesis using graph cuts. ACM Transactions on Graphics (ToG), 22(3):277–286, 2003.
  • [15] Qing Wu and Yizhou Yu. Feature matching and deformation for texture synthesis. ACM Transactions on Graphics (TOG), 23(3):364–367, 2004.
  • [16] Vivek Kwatra, Irfan Essa, Aaron Bobick, and Nipun Kwatra. Texture optimization for example-based synthesis. In ACM SIGGRAPH 2005 Papers, pages 795–802. 2005.
  • [17] David J Heeger and James R Bergen. Pyramid-based texture analysis/synthesis. In Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, pages 229–238, 1995.
  • [18] Javier Portilla and Eero P Simoncelli. A parametric texture model based on joint statistics of complex wavelet coefficients. International journal of computer vision, 40(1):49–70, 2000.
  • [19] Leon Gatys, Alexander S Ecker, and Matthias Bethge. Texture synthesis using convolutional neural networks. In Advances in neural information processing systems, pages 262–270, 2015.
  • [20] Gang Liu, Yann Gousseau, and Gui-Song Xia. Texture synthesis through convolutional neural networks and spectrum constraints. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 3234–3239. IEEE, 2016.
  • [21] Eric Risser, Pierre Wilmot, and Connelly Barnes. Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv preprint arXiv:1701.08893, 2017.
  • [22] Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. Diversified texture synthesis with feed-forward networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3920–3928, 2017.
  • [23] Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. Universal style transfer via feature transforms. In Advances in neural information processing systems, pages 386–396, 2017.
  • [24] Ivan Ustyuzhaninov, Wieland Brendel, Leon A Gatys, and Matthias Bethge. What does it take to generate natural textures? In ICLR (Poster), 2017.
  • [25] Wu Shi and Yu Qiao. Fast texture synthesis via pseudo optimizer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5498–5507, 2020.
  • [26] Yueru Chen, Mozhdeh Rouhsedaghat, Suya You, Raghuveer Rao, and C-C Jay Kuo. Pixelhop++: A small successive-subspace-learning-based (ssl-based) model for image classification. arXiv preprint arXiv:2002.03141, 2020.
  • [27] Xuejing Lei, Ganning Zhao, and C-C Jay Kuo. Nites: A non-parametric interpretable texture synthesis method. In 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 1698–1706. IEEE, 2020.
  • [28] C-C Jay Kuo. Understanding convolutional neural networks with a mathematical model. Journal of Visual Communication and Image Representation, 41:406–413, 2016.
  • [29] C-C Jay Kuo. The cnn as a guided multilayer recos transform [lecture notes]. IEEE signal processing magazine, 34(3):81–89, 2017.
  • [30] C-C Jay Kuo, Min Zhang, Siyang Li, Jiali Duan, and Yueru Chen. Interpretable convolutional neural networks via feedforward design. Journal of Visual Communication and Image Representation, 2019.
  • [31] Yueru Chen and C-C Jay Kuo. Pixelhop: A successive subspace learning (ssl) method for object recognition. Journal of Visual Communication and Image Representation, page 102749, 2020.
  • [32] Hong-Shuo Chen, Mozhdeh Rouhsedaghat, Hamza Ghani, Shuowen Hu, Suya You, and C-C Jay Kuo. Defakehop: A light-weight high-performance deepfake detector. In 2021 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2021.
  • [33] Kaitai Zhang, Bin Wang, Wei Wang, Fahad Sohrab, Moncef Gabbouj, and C-C Jay Kuo. Anomalyhop: An ssl-based image anomaly localization method. arXiv preprint arXiv:2105.03797, 2021.
  • [34] Pranav Kadam, Min Zhang, Shan Liu, and C-C Jay Kuo. R-pointhop: A green, accurate and unsupervised point cloud registration method. arXiv preprint arXiv:2103.08129, 2021.
  • [35] Xiaofeng Liu, Fangxu Xing, Chao Yang, C-C Jay Kuo, Suma Babu, Georges El Fakhri, Thomas Jenkins, and Jonghye Woo. Voxelhop: Successive subspace learning for als disease classification using structural mri. arXiv preprint arXiv:2101.05131, 2021.
  • [36] Min Zhang, Yifan Wang, Pranav Kadam, Shan Liu, and C-C Jay Kuo. Pointhop++: A lightweight learning model on point sets for 3d classification. In 2020 IEEE International Conference on Image Processing (ICIP), pages 3319–3323. IEEE, 2020.
  • [37] Min Zhang, Haoxuan You, Pranav Kadam, Shan Liu, and C-C Jay Kuo.

    Pointhop: An explainable machine learning method for point cloud classification.

    IEEE Transactions on Multimedia, 2020.
  • [38] Min Zhang, Pranav Kadam, Shan Liu, and C-C Jay Kuo. Unsupervised feedforward feature (uff) learning for point cloud classification and segmentation. In 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), pages 144–147. IEEE, 2020.
  • [39] Pranav Kadam, Min Zhang, Shan Liu, and C-C Jay Kuo. Unsupervised point cloud registration via salient points analysis (spa). In 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), pages 5–8. IEEE, 2020.
  • [40] Abinaya Manimaran, Thiyagarajan Ramanathan, Suya You, and C-C Jay Kuo. Visualization, discriminability and applications of interpretable saak features. Journal of Visual Communication and Image Representation, 66:102699, 2020.
  • [41] Tzu-Wei Tseng, Kai-Jiun Yang, C-C Jay Kuo, and Shang-Ho Tsai. An interpretable compression and classification system: Theory and applications. IEEE Access, 8:143962–143974, 2020.
  • [42] Mozhdeh Rouhsedaghat, Yifan Wang, Xiou Ge, Shuowen Hu, Suya You, and C-C Jay Kuo. Facehop: A light-weight low-resolution face gender classification method. arXiv preprint arXiv:2007.09510, 2020.
  • [43] Mozhdeh Rouhsedaghat, Masoud Monajatipoor, Zohreh Azizi, and C-C Jay Kuo. Successive subspace learning: An overview. arXiv preprint arXiv:2103.00121, 2021.
  • [44] Aapo Hyvärinen, Patrik O Hoyer, and Erkki Oja.

    Sparse code shrinkage: Denoising by nonlinear maximum likelihood estimation.

    In Advances in Neural Information Processing Systems, pages 473–479, 1999.
  • [45] Marian Stewart Bartlett, Javier R Movellan, and Terrence J Sejnowski. Face recognition by independent component analysis. IEEE Transactions on neural networks, 13(6):1450–1464, 2002.
  • [46] Nikolaos Mitianoudis and Tania Stathaki. Pixel-based and region-based image fusion schemes using ica bases. Information fusion, 8(2):131–142, 2007.
  • [47] Aapo Hyvärinen and Erkki Oja. Independent component analysis: algorithms and applications. Neural networks, 13(4-5):411–430, 2000.