Constrained Graphic Layout Generation via Latent Optimization

08/02/2021 ∙ by Kotaro Kikuchi, et al. ∙ 0

It is common in graphic design humans visually arrange various elements according to their design intent and semantics. For example, a title text almost always appears on top of other elements in a document. In this work, we generate graphic layouts that can flexibly incorporate such design semantics, either specified implicitly or explicitly by a user. We optimize using the latent space of an off-the-shelf layout generation model, allowing our approach to be complementary to and used with existing layout generation models. Our approach builds on a generative layout model based on a Transformer architecture, and formulates the layout generation as a constrained optimization problem where design constraints are used for element alignment, overlap avoidance, or any other user-specified relationship. We show in the experiments that our approach is capable of generating realistic layouts in both constrained and unconstrained generation tasks with a single model. The code is available at https://github.com/ktrk115/const_layout .

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Visual media contents are organized using design layouts to facilitate the conveying of information. Design layout consists of the arrangement of the size and position of the elements to be displayed, and is a critical part of graphic design. In general, articles start with a text title, followed by headings and the main text, usually in a top to bottom order. Mobile user interfaces arrange navigation, images, texts, or buttons cleanly in a given display resolution with fluid layouts. The semantic relationships, priority, and reading order of elements is carefully decided by graphic designers while considering the overall visual aesthetics of the design. Inexperienced designers often face the difficulty of producing high-quality presentations while conveying the designated message and maintaining fundamental design considerations such as alignment or overlap. Design constraints can be internal, derived from the one’s design experience and preference, or external, such as visual media regulations and client requirements. Automatic search of plausible layout candidates, such as we propose in this paper, can greatly aid in the design process.

Several attempts have been made to automatically generate graphic layouts in the computer graphics community (O’Donovan et al., 2014; O’Donovan et al., 2015). Recent studies (Li et al., 2019; Jyothi et al., 2019; Arroyo et al., 2021) using unconstrained deep generative models have shown to be able to generate plausible layouts thanks to large scale datasets of design examples. Some work explicitly introduce design constraints like alignment or overlap avoidance by additional losses or conditioning (Li et al., 2020; Lee et al., 2020). However, one drawback of integrating constraints in the learning objective is that a model must be fit to a new condition or a new loss when there appears a new constraint a user wishes to incorporate. We instead opt to perform the optimization in the latent space of the generative model, being complementary to and allowing for the usage of existing off-the-shelf models.

In this work, we propose a novel framework, which we call Constrained Layout Generation via Latent Optimization (CLG-LO), that defines constrained layout generation as a constrained optimization problem in the latent space of the model. An overview of the proposed framework is illustrated in Fig. 1. In our approach, we use a Generative Adversarial Network (GAN) trained in the unconstrained setting and model user specifications as a constrained optimization program. We optimize the latent code of the unconstrained model with an iterative algorithm to find a layout that satisfies the specified constraints. Our framework allows the user to use a single pre-trained GAN and incorporate various constraints into the layout generation as needed, eliminating the computationally expensive need of re-training of the model.

Although our approach can work with off-the-shelf generative layout models, in addition to CLG-LO framework, we also propose a Transformer (Vaswani et al., 2017) based layout GAN model, which we name LayoutGAN++. Relationships between elements can be well captured by Transformers in both the generator and the discriminator. With the help of representation learning of the discriminator through auxiliary layout reconstruction (Liu et al., 2021), LayoutGAN++ significantly improves the performance of the LayoutGAN (Li et al., 2019) for unconstrained layout generation.

We validate our proposed methods using three public datasets of graphic layouts. We design two constrained generation settings similar to real use cases. In the unconstrained generation task, LayoutGAN++ obtains comparable or better results than the existing methods. Using LayoutGAN++ as the backend model, CLG-LO shows significant improvements in the constrained generation task.

We summarize our contributions as follows:

  • A framework to generate layouts that satisfies given constraints by optimizing latent codes.

  • An architecture and methodology for layout GAN that allows for stable training and generation of high-quality layouts.

  • Extensive experiments and state-of-the-art results using public datasets for unconstrained and constrained layout generation.

2. Related Work

2.1. Layout Generation

There has been several studies on generating layout, both with or without user specification. Classic optimization approaches (O’Donovan et al., 2014; O’Donovan et al., 2015)

manually designed energy functions with a large number of constraints that a layout should satisfy. Recent works have utilized neural networks to learn a generative model of layout. LayoutVAE trained two types of Variational Auto-Encoders (VAE) to generate bounding boxes to the given label set 

(Jyothi et al., 2019). LayoutGAN trained relational generator by employing a wireframe renderer that rasterize bounding boxes and allows for training with a pixel-based discriminator (Li et al., 2019). Later, LayoutGAN was extended to include attribute conditioning (Li et al., 2020). Zheng et al. (2019) reported a raster layout generator conditioned on the given images, keywords, and attributes. READ (Patil et al., 2020) trained a hierarchical auto-encoder to generate document layout structures. Lee et al. (2020) proposed graph-based networks called Neural Design Networks (NDN) that explicitly infer element relations from partial user specification. Very recently, Gupta et al. (2021) described a Transformer-based model to generate layout in various domains. Also, Arroyo et al. (2021) reported a VAE model that generated layouts using self-attention networks. Apart from graphic design layouts, there has also been research on generating indoor scene layouts (Henderson et al., 2017; Ritchie et al., 2019; Zhang et al., 2020).

Our work considers both unconstrained generation (Gupta et al., 2021; Arroyo et al., 2021) and constrained generation (Li et al., 2020; Lee et al., 2020). We build our unconstrained layout generator based on LayoutGAN (Li et al., 2019), and apply user layout specification as constraints to a learned generator. Unlike NDN (Lee et al., 2020), we only need a single model to generate constrained layouts.

2.2. Latent Space Exploitation

With the recent progress in image synthesis using deep generative models (Karras et al., 2018, 2019), much of the research utilizing the latent space have been made in the image domain. In real image editing, the mainstream research involves projecting the target image into the latent space and performing non-trivial image editing with user input on the learned manifold (Zhu et al., 2016; Bau et al., 2019; Zhu et al., 2020). Pan et al. (2020)

also used the natural image priors learned by GAN and applied them to various image restoration tasks such as inpainting and colorization in a unified way.

Menon et al. (2020)

search through the latent space of high-resolution facial photos to achieve super-resolution of low-quality photos.

The utilization of latent variables in deep generative models have been less studied in non-image domains. Umetani (2017) proposed an interactive interface that uses a learned auto-encoder to find the shape of a 3D model by adjusting latent variables. Schrum et al. (2020) proposed an interface consisting of interactive evolutionary search and direct manipulation of latent variables for the game level design. Chiu et al. (2020) proposed a method to efficiently explore latent space in a human-in-the-loop fashion using a learned generative model, and validated it in the tasks of generating images, sounds, and 3D models.

Our layout generation approach shares the concept of latent space exploration, and we seek to find a latent representation of layout such that the resulting layout satisfies user-specified constraints.

3. Approach

Figure 2. Overview of our proposed LayoutGAN++ model.

Our goal is to generate a semantically plausible and high-quality design layout from a set of element labels and constraints specified by the user. We first train an unconstrained generative model of layout denoted LayoutGAN++, and later utilize the model for constrained generation tasks.

3.1. LayoutGAN++

In unconstrained generation, we take a set of elements and assign size and location to each element. We follow LayoutGAN (Li et al., 2019) and formulate our model, which we refer LayoutGAN++, in the following. Formally, our generator takes a set of randomly-generated codes and a conditional multiset of labels as input, and outputs a set of bounding boxes , where represents the position and size of the element in normalized coordinates. is the number of elements in a layout, and the subscript in , , and refers to the same -th element. The definition of a label depends on the dataset; e.g., text or table elements in PubLayNet dataset. Our discriminator takes the generated bounding boxes and conditional labels as input, and outputs a scalar value which quantifies the realism of layout, as well as attempts at reconstructing the given bounding boxes from the internal representation. We show in Fig. 2 the overall architecture of our model.

3.1.1. Generator

Our generator consists of the following:

(1)
(2)
(3)
(4)

where ,

are multi-layer perceptrons,

and

are hidden representations for each element, and

is the parameters for the generator. We adopt the Transformer block (Vaswani et al., 2017) to learn relational representation among elements, in contrast to LayoutGAN (Wang et al., 2018)

that utilizes a dot product-based non-local block with a residual connection.

3.1.2. Discriminator

Our discriminator has a similar architecture to our generator.

(5)
(6)
(7)

where is a special learnable embedding appended to the hidden element representations, is the corresponding output for the learnable embedding after the Transformer block, is the quantity to evaluate the reality of the given input, and is the parameters of the discriminator. We do not employ the wireframe renderer of LayoutGAN (Wang et al., 2018), because we find that the raster domain discriminator becomes unstable with limited dataset size. We compare with LayoutGAN in our experiments.

3.1.3. Auxiliary Decoder

We empirically find that in well-aligned layout domains such as documents, the discriminator is trained to be sensitive to alignment and less sensitive to positional trends, i.e

., it only cares if the elements are aligned, and does not care about unusual layouts such as placing the header element at the bottom. Following the self-supervised learning of

Liu et al. (2021), we apply additional regularization to the discriminator so that the discriminator becomes aware of the positional trends. We add an auxiliary decoder to reconstruct the bounding boxes given to the discriminator from the internal representation :

(8)
(9)
(10)

where

is a learnable positional embedding initialized with the uniform distribution of

, is a reconstructed bounding box, is a reconstructed label, and is the parameters of the auxiliary decoder.

3.1.4. Training objective

The objective function of our model is the following:

(11)

where we denote the reconstruction loss by . The reconstruction loss measures the similarity between two sets of bounding boxes and labels, and we employ mean squared error for bounding boxes, and cross entropy for labels. We compute the reconstruction loss by first sorting the bounding boxes in lexicographic order of the ground-truth positions (Carlier et al., 2020).

3.2. Constrained Layout Generation via Latent Optimization (CLG-LO)

Let us consider when there are user-specified constraints, such as an element A must be above an element B. From the perspective of the generator, such constraints restricts the available output space. We formulate the generation with user specification in a constrained optimization problem. Given a pre-trained generator and discriminator , and a set of constraints , we define the constrained minimization problem regarding latent codes :

(12)

The intuition is that we seek to find bounding boxes that looks as realistic as possible to the discriminator and satisfies the user-specified constraints. Once the optimal latent codes is found, we can obtain bounding boxes that satisfy the constraints as follows:

(13)

We use the augmented Lagrangian method (Nocedal and Wright, 2006), which is one of the widely used algorithms for solving nonlinear optimization problems. In this method, the constrained problem is transformed into an unconstrained problem that optimizes the augmented Lagrangian function, which combines the Lagrangian and penalty functions. Let us rewrite and in Eq. (12) for brevity, then we define the following augmented Lagrangian function ,

(14)

where are the Lagrange multipliers and is a penalty parameter to weight the quadratic functions.

In this method, the Lagrange multipliers are updated according to the extent of constraint violation, and the penalty parameter is gradually increased to make the impact of the constraints larger. Let be the current iteration, the update equations are expressed as:

(15)
(16)

where

is a predefined hyperparameter.

Algorithm 1 summarizes the procedure of our method. We repeat the main loop until the amount of constraint violation is sufficiently small or the iteration count reaches the maximum number of iterations . We set , , , and in the experiments. For the inner optimizer, we use either Adam (Kingma and Ba, 2015) with a learning rate of 0.01 or CMA-ES (Hansen, 2016) with a initial sigma value of 0.25, and both run for 200 iterations. We compare in Sec 4.4 which optimizer yields a better solution.

Input: pre-trained generator , pre-trained discriminator , labels , constraints , initial Lagrange multipliers , initial penalty parameter 
Output: bounding boxes 
repeat
      // Inner optimization (Eq. (14))
       starting at
     [1mm] Update the Lagrange multipliers by Eq. (15) to obtain
      Update the penalty parameter by Eq. (16) to obtain
     
     
     [1mm]
      until stopping criteria is fulfilled;
     
return
ALGORITHM 1 Constrained layout generation via latent code optimization

In practice, optimizing the output value of the discriminator directly may yield an adversarial example, i.e., the discriminator considers it as the real, but perceptually degraded. To avoid this, we clamp the output value of the discriminator based on a certain threshold. Specifically, we use as the threshold, and instead of in Eq. (14).

4. Experiments

We evaluate the proposed method on both unconstrained and constrained layout generation tasks. We first describe the datasets and evaluation metrics, and then explain the experimental setup for each task.

4.1. Dataset

Dataset # label types Max. # elements # train. # val. # test.
Rico (Deka et al., 2017; Liu et al., 2018) 13 9 17,515 1,030 2,061
PubLayNet (Zhong et al., 2019) 5 9 160,549 8,450 4,226
Magazine (Zheng et al., 2019) 5 33 3,331 196 392
Table 1. Statistics of the datasets used in our experiments and the splits using for evaluation.

We evaluate layout generation on different types of graphic designs. We use three publicly available datasets: Rico (Deka et al., 2017; Liu et al., 2018) provides UI designs collected from mobile apps, PubLayNet (Zhong et al., 2019) compiles a dataset of document images, and Magazine (Zheng et al., 2019) collects magazine pages. Following the previous studies (Li et al., 2019; Lee et al., 2020), we exclude elements whose labels are not in the 13 most frequent labels in the Rico dataset, and exclude layouts with more than 10 elements in both the Rico and PubLayNet datasets. For the PubLayNet dataset, we use 95% of the official training split for training, the rest for validation, and the official validation split for testing. For Rico and Magazine, since there is no official data split, we use 85% of the dataset for training, 5% for validation, and 10% for testing. We summarize the statistics of the datasets in Table 1.

4.2. Evaluation Metrics

We use four metrics to measure the quality of the generated layouts: Fréchet Inception Distance (FID) (Heusel et al., 2017), Maximum Intersection over Union (IoU), Alignment, and Overlap.

Layout variants Class Recon Class+ Recon
Real

- - -
Added noise

186.64 37.99 127.57
Vertically flipped

3.37 97.91 100.34
Nearest neighbour

0.29 12.52 11.80
Table 2. Comparison of FID scores computed using feature extractors trained with various objectives. In particular we compare feature extractors trained with classification loss (Class), reconstruction loss (Recon), and a combination of both (Class+Recon). We compute the FID score between real layouts and variants that have added noise, have been vertically flipped, and nearest neighbors from the validation set.

4.2.1. Fid

To compute FID, we need to define the representative features of layouts. We follow the approach of Lee et al. (2020)

, and train a neural network to classify between real layouts and noise added layouts, and use the intermediate features of the network. One difference from

(Lee et al., 2020) is that we incorporate the auxiliary decoder in Sec 3.1.3 learning such that the trained network is aware of both alignment and positions. In Table 2, we show a comparison of FIDs across networks learned with different objectives; Class is real/fake classification only, Recon is auxiliary reconstruction only, and Class+Recon is learned with both objectives. The combination of both objectives improves the sensitivity to different layout arrangements.

4.2.2. Maximum IoU

Maximum IoU is defined between two collections of generated layouts and references. We first define IoU based similarity between two layouts and . We consider the optimal matching between and , then compute the average IoU of bounding boxes. Let be a one-by-one matching, and be a set of possible permutations for size . Note that we only consider matches between two bounding boxes with the same label, i.e., . The similarity with respect to the optimal matching is computed as

(17)

where computes IoU between bounding boxes. To evaluate the similarity between generated layouts and references , we compute the average similarity on the optimal matching:

(18)

where we only consider matches between two layouts with an identical label set, i.e., . We use the solver (Crouse, 2016) provided by SciPy (Virtanen et al., 2020) to solve the assignment problems.

4.2.3. Alignment and overlap

We use the Alignment and Overlap metrics used in the previous work (Li et al., 2020). We modify the original metrics by normalizing with the number of elements .

4.3. Unconstrained Layout Generation

4.3.1. Setup

We use LayoutGAN (Li et al., 2019) and NDN (Lee et al., 2020) as baselines. Although LayoutGAN is intended for the unconditional setting, we adapt the model to be conditioned on a label set input. We refer to the model using the wireframe rendering discriminator as LayoutGAN-W and the one using the relation-based discriminator as LayoutGAN-R. NDN first generates the position and size relations between elements, then generates bounding boxes based on the relations, and finally modifies the misalignment of the boxes. We denote it as NDN-none to match the designation in their paper, as our setting does not specify the relations. We reimplement all the baselines as since the official codes for the baselines are not publicly available111 The authors of LayoutGAN provide only the code for point layout experiment in https://github.com/JiananLi2016/LayoutGAN-Tensorflow, not for bounding boxes.

. We implement our LayoutGAN++ with PyTorch 

(Paszke et al., 2019). We train the model using the Adam optimizer with 200,000 iterations with a batch size of 64 and a learning rate of 1e-5, taking six hours with a GPU of NVIDIA GeForce RTX 2080Ti. Our Transformer modules consist of 8 blocks, and in each block, we set the input/output dimension to 256, the dimension of the hidden layer to 128, and the number of multi-head attentions to 4.

4.3.2. Results

Dataset Rico PubLayNet Magazine
(lr)2-5 (lr)6-9 (lr)10-13 Model FID Max. IoU Alignment Overlap FID Max. IoU Alignment Overlap FID Max. IoU Alignment Overlap

LayoutGAN-W (Li et al., 2019)
162.750.28 0.300.00 0.710.00 174.110.22 195.380.46 0.210.00 1.210.01 138.770.21 159.200.87 0.120.00 0.740.02 188.770.93

LayoutGAN-R (Li et al., 2019)
52.010.62 0.240.00 1.130.04 69.370.66 100.240.61 0.240.00 0.820.01 45.640.32 100.660.35 0.160.00 1.900.02 111.851.44

NDN-none (Lee et al., 2020)
13.760.28 0.350.00 0.560.03 54.750.29 35.670.35 0.310.00 0.350.01 16.500.29 23.270.90 0.220.00 1.050.03 30.310.77

LayoutGAN++
14.430.13 0.360.00 0.600.12 59.850.59 20.480.29 0.360.00 0.190.00 22.800.32 13.350.41 0.260.00 0.800.02 32.400.89


Real data
4.47 0.65 0.26 50.58 9.54 0.53 0.04 0.22 12.13 0.35 0.43 25.64

Table 3. Quantitative comparison of unconstrained layout generation. The values of Alignment and Overlap are multiplied by for visibility. Comparisons are provided on three different datasets (Rico, PubLaynet, and Magazine). For reference, the FID and Max. IoU computed between the validation and test data, and the Alignment and Overlap computed with the test data are shown as real data.
Label set LayoutGAN-W (Li et al., 2019) LayoutGAN-R (Li et al., 2019) NDN-none (Lee et al., 2020) LayoutGAN++ Real data Label set LayoutGAN-W (Li et al., 2019) LayoutGAN-R (Li et al., 2019) NDN-none (Lee et al., 2020) LayoutGAN++ Real data

Rico

PubLayNet

Magazine

Figure 3. Qualitative comparison of unconstrained layout generation. Label set indicates the total number of labels and their type for each conditional generation result. On the right we show the real data from which the label set was taken.

We summarize the quantitative comparison in Table 3 and the qualitative comparison in Fig. 3

. Since all the comparison methods are stochastic, we report the mean and standard deviation of five evaluations with the same trained model. Regarding LayoutGAN 

(Li et al., 2019), we find that LayoutGAN-W is unstable to train, and failed to reproduce the results as good as in their paper despite our efforts, which is similarly reported in the recent studies (Gupta et al., 2021; Arroyo et al., 2021). Our results show that LayoutGAN-R is much stable to train, and outperforms LayoutGAN-W. Our LayoutGAN++ achieves comparable to or better results than the current state-of-the-art method NDN-none (Lee et al., 2020), in particular, results on the Rico dataset are similar, while results on the PubLayNet dataset and Magazine dataset are favourable to our approach.

4.4. Layout Generation with Beautification Constraints

The goal of this setting is to generate a well-aligned layout with no overlapping, which can serve as a post-processing to beautify the result of the unconstrained layout generation. We conduct the experiment with the PubLayNet dataset, in which most of the layouts are aligned and have little overlap.

4.4.1. Constraints

Let be the function that computes the Alignment metric, we express the alignment constraint as

(19)

where is a threshold parameter. We set in our experiment. We use the Overlap metric as the non-overlapping constraint .

4.4.2. Setup

We use a pre-trained LayoutGAN++ model within our proposed CLG-LO framework to perform the constrained task. We follow the same settings as in Section 4.3 for training LayoutGAN++. We compare two different inner optimizers, Adam (Kingma and Ba, 2015) and CMA-ES (Hansen, 2016). The mean runtime for CLG-LO was 13.6 seconds with Adam (SD: 11.2) and 1.45 seconds with CMA-ES (SD: 1.75).

Since there is no directly comparable methods in the literature for this setting, we design a baseline called CAL that uses constraints as additional losses, referring to the similar work (Li et al., 2020). To instantiate CAL, we train LayoutGAN++ with both the alignment constraint and the non-overlapping constraint added to the generator objective, which encourages a generated layout that satisfies the constraints, but does not explicitly enforce them.

4.4.3. Results

Model FID Max. IoU Alignment Overlap

Base model
20.480.29 0.360.00 0.190.00 22.800.32

CAL
13.310.17 0.380.00 0.160.00 14.270.19

CLG-LO w/ Adam
21.790.38 0.360.00 0.160.00 1.180.04

CLG-LO w/ CMA-ES
22.970.38 0.360.00 0.140.00 0.020.00

Table 4. Quantitative results with beautification constraints. Base model refers to the unconstrained LayoutGAN++. The values of Alignment and Overlap are multiplied by for visibility.
Initial

Optimized

Figure 4. Qualitative results with beautification constraints for CLG-LO w/ CMA-ES. Initial unconditioned generation results are shown in the top row and the optimized results are shown in the bottom row.

We summarize the quantitative comparison in Table 4. The base model is LayoutGAN++ without beautification. We can see that CAL performs better in terms of Alignment and Overlap than the baseline, thanks to the added losses. FID and Maximum IoU are also improved, which may be due to the inductive bias expressed as the added losses, making GAN easier to train. Our CLG-LO further improves Alignment and Overlap significantly with almost no degradation in terms of FID and Maximum IoU. As for the choice of inner optimizer, CMA-ES seems to perform better than Adam. We suspect that due to the augmented Lagrangian function (Eq. (14)) having many local solutions, and thus a population-based global gradient-free optimization method, e.g., CMA-ES, is more suitable than a gradient-based method, e.g., Adam.

We show the optimization results by CLG-LO using CMA-ES as the inner optimizer in Fig. 4. Our framework successfully found aligned and non-overlapping layouts. We have set the initial sigma parameter of CMA-ES smaller to explore around the initial latent code, which leads to the optimized layout not changing significantly from the initial layout.

4.5. Layout Generation with Relational Constraints

In this setting, we consider a scenario where the user specifies the location and size relationships of elements in the layout. We consider three size relations, smaller, larger and equal, and five location relations, above, bottom, left, right, and overlap. We also define the relation to the canvas, e.g., positioning at the top of the canvas. We determine the relations from the ground-truth layout and use its subset as constraines. We change percentages of the relations used as constraints and report the rate of violated constraints.

Rico PubLayNet Magazine
Constraints Initial Optimized Constraints Initial Optimized Constraints Initial Optimized

Figure 5. Qualitative results with relational constraints for the three datasets for our prposed CLG-LO w/ CMA-ES. In each column, for each result we show the constraints on the left, the initial unconstrained generation result in the middle, and the optimized result on the right.
Dataset Rico PubLayNet Magazine
(lr)2-4 (lr)5-7 (lr)8-10 Model Max. IoU Alignment Const. violation (%) Max. IoU Alignment Const. violation (%) Max. IoU Alignment Const. violation (%)

NDN (Lee et al., 2020)
0.360.00 0.560.03 12.750.27 0.310.00 0.360.00 17.300.54 0.230.00 1.040.05 14.850.44

CLG-LO
0.360.00 0.770.09 0.840.13 0.360.00 0.230.01 4.610.17 0.260.00 0.790.03 1.770.39

Table 5. Quantitative results with relational constraints when 10% of all the relational constraints are used. The values of Alignment are multiplied by for visibility.
Figure 6. Quantitative results with relational constraints. The different colors correspond to each of the three datasets. The solid lines denotes CLG-LO, and the dashed lines denotes NDN. Higher is better for Max. IoU, and lower is better for Alignment and Violation. Our proposed CLG-LO approach often outperforms NDN when only a small part of relations is specified.

4.5.1. Constraints

The size constraint  is defined as the sum of cost functions of all size relations. For example, suppose the user specifies that the -th element has to be larger than the -th element, then the cost function of larger relation is defined by:

(20)

where is a function that calculates the area of a given bounding box, and is a tolerance parameter shared across the size relations. We set in our experiment.

We also define the location constraint  in the same way. For example, suppose the user specifies that the -th element has to be above the -th element, then the cost function of above relation is defined by:

(21)

where and are functions that return the top and bottom coordinates of a given bounding box, respectively.

4.5.2. Setup

We compare our CLG-LO against NDN (Lee et al., 2020). In CLG-LO, we use CMA-ES for the inner optimizer, as it worked well in the experiments with beautification constraints. The rest of the settings follow the experiment with beautification constraints, but for a fair comparison, we did not use the beautification constraints themselves. The mean runtime for CLG-LO was 1.96 seconds (SD: 3.48).

4.5.3. Results

We show the qualitative results in Fig. 5 and the quantitative comparison in Table 5. We report the results for a setting that uses 10% of all relations in Table 5, which is what we believe would be representative of a realistic usage scenario. A typical example that uses roughly 10% relations is the upper left one in Fig. 5. Our CLG-LO performed comparable to or better than NDN, and in particular showed significant improvement in the constraint violation metric. This is as to be expected because NDN does not guarantee the inferred result satisfies the constraints, whereas our method tries to find a solution that satisfies as many of the constraints as possible through iterative optimization.

We also show in Fig. 6 the experimental results of varying the percentage of relations used. We can find that NDN performs better as increasing the number of relations used, which is reasonable since its layout generation module is trained with the complete relational graph of the ground-truth layout. On the other hand, our CLG-LO performs unfavorably as increasing the number of relations used, because it becomes harder to find a solution that satisfies the constraints. A practical remedy when no solution is found could be to store a layout for each iteration of the main loop in Algorithm 1, and let the user choose one based on the trade-off between constraint satisfaction and layout quality. We note, however, that our method performs best in realistic scenarios where the number of user-specified relations is few.

5. Conclusions and Discussion

In this paper, we proposed a novel framework called Constrained Layout Generation via Latent Optimization (CLG-LO), which performs constrained layout generation by optimizing the latent codes of pre-trained GAN. While existing works treat constraints as either additional objectives or conditioning, requiring re-training when unexpected constraints are involved, our framework can flexibly incorporate a variety of constraints using a single unconstrained GAN. While our approach is applicable to most generative layout design models, we also present a new layout generation model called LayoutGAN++ that is able to outperform existing approaches in unconditioned generation. Experimental results on both unconstrained and constrained generation tasks using three public datasets support the effectiveness of the proposed methods.

While our approach is able to significantly outperform existing approaches in many cases, given the non-convexity and complexity of the optimization problem as the objective and constraint functions in Eq. (12) involve a complex nonlinear neural network, we have no guarantees on the convergence of the approach. When the number of constraints becomes large (Figure 6), the optimizer can have issues finding a good solution, and underperform existing approaches. However, in general, most users will not specify very large number of constraints, and in those situations, our approach significantly outperforms existing approaches. We believe that this effect can be mitigated by improving the optimization approach itself, using piece-wise convex approximations, or improving the initialization of the optimization variables. It may also be practical to design an interaction that asks the user to remove or change difficult constraints.

Our optimization-based approach allows us to flexibly change not only the constraint function, but also the objective function. For example, if we wish to limit the amount of change, we can add the distance between the boxes before and after the optimization as a penalty to the objective function. Our approach can also be applied to any model that can generate diverse plausible layouts through manipulating latent variables. Note that when used with VAE-based models (Jyothi et al., 2019; Arroyo et al., 2021; Lee et al., 2020) that do not have an explicit function to measure the quality of the generated layout, it becomes a constraint satisfaction problem. Our approach still works in such cases, but if the quality of the outcome is problematic, it may be necessary to train an additional measurement network like a discriminator.

There are many open directions for improvement such as incorporating models that approximate human perception as constraints (Bylinskii et al., 2017; Zhao et al., 2018) in order to generate more aesthetically pleasing results. Exploring latent codes considering the diversity of layouts is another exciting direction (Pugh et al., 2016), allowing for efficient design exploration with a variety of alternatives. Also, it is worth investigating whether or not our proposed CLG-LO approach can be applied generation problems other than that of layout designs.

Acknowledgements.
This work is partially supported by Waseda University Leading Graduate Program for Embodiment Informatics.

References

  • (1)
  • Arroyo et al. (2021) Diego Martin Arroyo, Janis Postels, and Federico Tombari. 2021. Variational Transformer Networks for Layout Generation. arXiv:arXiv:2104.02416
  • Bau et al. (2019) David Bau, Hendrik Strobelt, William Peebles, Jonas Wulff, Bolei Zhou, Jun-Yan Zhu, and Antonio Torralba. 2019. Semantic Photo Manipulation with a Generative Image Prior. ACM Trans. Graph. 38, 4, Article 59 (2019), 11 pages.
  • Bylinskii et al. (2017) Zoya Bylinskii, Nam Wook Kim, Peter O’Donovan, Sami Alsheikh, Spandan Madan, Hanspeter Pfister, Fredo Durand, Bryan Russell, and Aaron Hertzmann. 2017.

    Learning Visual Importance for Graphic Designs and Data Visualizations.

    ACM Symp. User Inter. Soft. Tech. (2017).
  • Carlier et al. (2020) Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. 2020.

    DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation. In

    Adv. Neural Inform. Process. Syst.
  • Chiu et al. (2020) Chia-Hsing Chiu, Yuki Koyama, Yu-Chi Lai, Takeo Igarashi, and Yonghao Yue. 2020. Human-in-the-Loop Differential Subspace Search in High-Dimensional Latent Space. ACM Trans. Graph. (2020).
  • Crouse (2016) David F. Crouse. 2016. On Implementing 2D Rectangular Assignment Algorithms. IEEE Trans. Aerospace Electron. Systems (2016).
  • Deka et al. (2017) Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In ACM Symp. User Inter. Soft. Tech.
  • Gupta et al. (2021) Kamal Gupta, Vijay Mahadevan, Alessandro Achille, Justin Lazarow, Larry S. Davis, and Abhinav Shrivastava. 2021. Multimodal Attention for Layout Synthesis in Diverse Domains. https://openreview.net/forum?id=L2LEB4vd9Qw
  • Hansen (2016) Nikolaus Hansen. 2016. The CMA Evolution Strategy: A Tutorial. arXiv:arXiv:1604.00772
  • Henderson et al. (2017) Paul Henderson, Kartic Subr, and Vittorio Ferrari. 2017. Automatic Generation of Constrained Furniture Layouts. arXiv preprint arXiv:1711.10939 (2017).
  • Heusel et al. (2017) Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Adv. Neural Inform. Process. Syst.
  • Jyothi et al. (2019) Akash Abdu Jyothi, Thibaut Durand, Jiawei He, Leonid Sigal, and Greg Mori. 2019. LayoutVAE: Stochastic Scene Layout Generation From a Label Set. In Int. Conf. Comput. Vis.
  • Karras et al. (2018) Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In Int. Conf. Learn. Represent.
  • Karras et al. (2019) Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In IEEE Conf. Comput. Vis. Pattern Recog.
  • Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Int. Conf. Learn. Represent.
  • Lee et al. (2020) Hsin-Ying Lee, Lu Jiang, Irfan Essa, Phuong B. Le, Haifeng Gong, Ming-Hsuan Yang, and Weilong Yang. 2020. Neural Design Network: Graphic Layout Generation with Constraints. In Eur. Conf. Comput. Vis.
  • Li et al. (2019) Jianan Li, Jimei Yang, Aaron Hertzmann, Jianming Zhang, and Tingfa Xu. 2019. LayoutGAN: Synthesizing Graphic Layouts with Vector-Wireframe Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell. (2019).
  • Li et al. (2020) Jianan Li, Jimei Yang, Jianming Zhang, Chang Liu, Christina Wang, and Tingfa Xu. 2020. Attribute-conditioned Layout GAN for Automatic Graphic Design. IEEE Trans. Vis. Comput. Graph. (2020).
  • Liu et al. (2021) Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed Elgammal. 2021. Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis. In Int. Conf. Learn. Represent.
  • Liu et al. (2018) Thomas F. Liu, Mark Craft, Jason Situ, Ersin Yumer, Radomir Mech, and Ranjitha Kumar. 2018. Learning Design Semantics for Mobile Apps. In ACM Symp. User Inter. Soft. Tech.
  • Menon et al. (2020) Sachit Menon, Alex Damian, McCourt Hu, Nikhil Ravi, and Cynthia Rudin. 2020. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models. In IEEE Conf. Comput. Vis. Pattern Recog.
  • Nocedal and Wright (2006) Jorge Nocedal and Stephen J. Wright. 2006. Numerical Optimization. Springer, Chapter 17.
  • O’Donovan et al. (2015) Peter O’Donovan, Aseem Agarwala, and Aaron Hertzmann. 2015. DesignScape: Design with interactive layout suggestions. In CHI.
  • O’Donovan et al. (2014) Peter O’Donovan, Aseem Agarwala, and Aaron Hertzmann. 2014. Learning layouts for single-pagegraphic designs. IEEE Trans. Vis. Comput. Graph. (2014).
  • Pan et al. (2020) Xingang Pan, Xiaohang Zhan, Bo Dai, Dahua Lin, Chen Change Loy, and Ping Luo. 2020. Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation. In Eur. Conf. Comput. Vis., Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). 262–277.
  • Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019.

    PyTorch: An Imperative Style, High-Performance Deep Learning Library. In

    Adv. Neural Inform. Process. Syst.
  • Patil et al. (2020) Akshay Gadi Patil, Omri Ben-Eliezer, Or Perel, and Hadar Averbuch-Elor. 2020.

    READ: Recursive autoencoders for document layout generation. In

    IEEE Conf. Comput. Vis. Pattern Recog. Worksh.
  • Pugh et al. (2016) Justin K. Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016.

    Quality Diversity: A New Frontier for Evolutionary Computation.

    Frontiers in Robotics and AI (2016).
  • Ritchie et al. (2019) Daniel Ritchie, Kai Wang, and Yu-an Lin. 2019. Fast and flexible indoor scene synthesis via deep convolutional generative models. In IEEE Conf. Comput. Vis. Pattern Recog.
  • Schrum et al. (2020) Jacob Schrum, Jake Gutierrez, Vanessa Volz, Jialin Liu, Simon Lucas, and Sebastian Risi. 2020. Interactive Evolution and Exploration within Latent Level-Design Space of Generative Adversarial Networks. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (GECCO ’20). 148–156.
  • Umetani (2017) Nobuyuki Umetani. 2017. Exploring Generative 3D Shapes Using Autoencoder Networks. In SIGGRAPH Asia 2017 Technical Briefs (SA ’17). Article 24, 4 pages.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, undefinedukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Adv. Neural Inform. Process. Syst.
  • Virtanen et al. (2020) Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods (2020).
  • Wang et al. (2018) Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local Neural Networks. In IEEE Conf. Comput. Vis. Pattern Recog.
  • Zhang et al. (2020) Zaiwei Zhang, Zhenpei Yang, Chongyang Ma, Linjie Luo, Alexander Huth, Etienne Vouga, and Qixing Huang. 2020. Deep generative modeling for scene synthesis via hybrid representations. ACM Trans. Graph. 39, 2 (2020), 1–21.
  • Zhao et al. (2018) Nanxuan Zhao, Ying Cao, and Rynson W.H. Lau. 2018. What Characterizes Personalities of Graphic Designs? ACM Trans. Graph. (2018).
  • Zheng et al. (2019) Xinru Zheng, Xiaotian Qiao, Ying Cao, and Rynson W.H. Lau. 2019. Content-aware Generative Modeling of Graphic Design Layouts. ACM Trans. Graph. (2019).
  • Zhong et al. (2019) Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. 2019. PubLayNet: Largest Dataset Ever for Document Layout Analysis. In IEEE Conf. Doc. Anal. Recog.
  • Zhu et al. (2020) Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020. In-Domain GAN Inversion for Real Image Editing. In Eur. Conf. Comput. Vis., Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). 592–608.
  • Zhu et al. (2016) Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. In Eur. Conf. Comput. Vis.