Automatic Generation of Constrained Furniture Layouts

11/29/2017 ∙ by Paul Henderson, et al. ∙ Google 0

Efficient authoring of vast virtual environments hinges on algorithms that are able to automatically generate content while also being controllable. We propose a method to automatically generate furniture layouts for indoor environments. Our method is simple, efficient, human-interpretable and amenable to a wide variety of constraints. We model the composition of rooms into classes of objects and learn joint (co-occurrence) statistics from a database of training layouts. We generate new layouts by performing a sequence of conditional sampling steps, exploiting the statistics learned from the database. The generated layouts are specified as 3D object models, along with their positions and orientations. We incorporate constraints using a general mechanism -- rejection sampling -- which provides great flexibility at the cost of extra computation. We demonstrate the versatility of our method by accommodating a wide variety of constraints.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 5

page 6

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Large scale virtual environments are an important feature across a plethora of applications such as massively-multiplayer-online computer games, movies, and even simulators used to train self-driving cars. Manual authoring of environments at these scales is impractical and there is a need for algorithms that can automatically generate realistic virtual environments. To be considered useful, it is important that automatic algorithms are able to accommodate constraints stipulated by domain experts. In this paper, we address the problem of automatically generating furniture layouts for indoor scenes. We propose a simple algorithm to automatically populate empty rooms with realistic arrangements of 3D models of furniture and other objects, using occurrence and placement statistics learned from a training database of layouts of CAD models. Our method supports a wide variety of a priori constraints. It can be used in conjunction with methods such as [MSK10, MVLS14] that place rooms within buildings, to automatically generate complete indoor virtual environments.

The computer graphics literature is rich with methods that enable 3D content creation, from landscapes [CGG17] and urban sprawls [PM01] to individual objects such as furniture [LXC17], buildings [MWH06, NGDA16], etc. These methods involve varying degrees of user-interaction to achieve realism and/or aesthetic appeal. Procedural approaches rely on parametric controllability while methods that are posed as optimization rely on constraint-driven controllability. A third class of methods adopts a data-driven approach to generate or edit models based on features learned from training examples [FKS04, EVC15, LXC17].

We propose a new data-driven, probabilistic, generative model for 3D room layouts (Sec. 2). Our model learns statistics from a database [SYZ17] containing over rooms designed by humans (Sec. 3). We categorize over 3D models from the database and learn conditional statistics across these categories. The output space of this learned model is a high-dimensional combination of discrete and continuous variables. Our model is based on directed, acyclic dependencies between categories, which allows easy and very efficient ancestral sampling. Each sample results in a furniture layout. We demonstrate the flexibility of this approach by incorporating user-specified constraints such as accessibility, sizes and shapes of rooms, locations of doors and windows, and constraints on the location of furniture items such as sofas and television screens (Sec. 4). Finally, we present a user study showing that layouts generated by our model are realistic (equal quality to those in the training set), and compare favorably to the state-of-the-art method [WSCR18] (Sec. 5).

1.1 Related work


[1] [2] [3] [4] [5] [6] [7] ours
room type - - -
room size
room shape - - - - - -
traversability - - - - -
object existence - - - -
object placement - - - - - -
fully automatic - - -
Table 1: Comparison of constraints supported by prior methods and ours. Legend: 1 =[WSCR18]; 2 =[QZH18]; 3 =[FCW17]; 4 =[LZM17]; 5 =[SKLTZ16]; 6 =[YYT11]; 7 =[MSL11]. ‘Fully automatic’ methods support generation without user specification of any furniture classes.

Priors for understanding structure in indoor scenes

: Choi et al choi13cvpr performed 3D object detection and layout estimation by inferring spatial relations between objects using a discriminative, energy-based formulation. They do not present a generative model over layouts. Zhao and Zhu zhao11nips,zhao13cvpr built a probabilistic grammar model, using specifically engineered production rules, over cube-based 3D structures constituting parts of rooms. This grammar generates arrangements of coarse blocks and does not result in layouts of entire rooms. Similarly, treating objects as cuboids, Del Pero et al delpero12cvpr proposed a generative model over room size and layout using learned distributions for the dimensions of the cuboids. The model does not learn inter-object relationships such as co-occurrence or relative locations. Although the importance of such relationships was discussed in follow-up work 

[DPBF13], inter-object relationships were not incorporated in the generative model.

room type

furniture

room structure

smallobjects

ceiling-objects

wall-objects
Figure 1: Elements of our generative model with arrows representing conditional dependencies. The directional dependencies enable ancestral sampling. Given a sample , generated at node , we can draw a sample from its child node, , according to . Different categories of object (middle row) are sampled conditional on the room type; the room structure (i.e. sizes of the cells) is defined by the furniture it contains.

Generating realistic furniture layouts: A common approach is to adopt an energy-based formulation [MSL11, YYT11, SKLTZ16, HPSC16, QZH18] with potentials between objects to impose constraints and preferences. The method of Handa et al handa16icra generates room layouts by optimizing a pairwise energy term using simulated annealing, with random initialization. Sadeghipour Kermani et al sadeghipourkermani:2016:learnsynth propose a method for generating bedrooms, that separates the sampling of classes and counts from the spatial arrangement. Liang et al Liang:17 also propose a two-step method, demonstrated on five room types. Objects are first selected based on statistics learned from a database, without considering inter-object relationships. Their locations are then chosen by MCMC sampling from a pairwise energy model. None of these methods reason about smaller objects positioned on the furniture, nor objects mounted on the walls or ceiling. Moreover, both simulated annealing and MCMC are slow, and not guaranteed to converge to a valid layout.

Very recently, Qi et al qi18cvpr represent indoor scenes using a probabilistic grammar, using a conditional Gibbs distribution as prior on its parse graphs. The conditioning parameter is learned from a large database. Their approach requires considerable manual modeling including specification of potential functions and grouping relations between objects such as chairs and tables. Novel layouts are generated using MCMC sampling, along with simulated annealing; this takes around 2400s per layout. Wang et al wang18tog exploit the power of convolutional neural networks (CNNs). Their method generates layouts by using three different CNNs to decide whether to add furniture, what furniture to add, and where to place it. This approach avoids a costly MCMC optimization process, but still takes several minutes on a GPU to sample a single room, and several days to train the models. Both these approaches are trained with the same dataset that we use in this paper. While they are fully automatic, neither supports inputing user-specified constraints.

Other methods suggest new layouts based on partial input, such as the object classes and an initial arrangement [MSL11, YYT11, FCW17]. Fu et al Fu:2017 exploit knowledge of how human agents interact with indoor environments to synthesize entire room layouts from a room shape and a few classes of objects that are to be present. This method is the fastest among prior works, taking approximately 2s to generate a layout.

Finally, some works model small areas of rooms centered around one item of furniture, instead of complete layouts. Fisher et al fisher12siggraph propose a method that learns from a dataset of 130 layouts, to embellish parts of rooms by adding relevant objects around (e.g. chairs around a table) as well as on (e.g. books on a table) furniture. It does not model entire layouts of rooms containing complex arrangements of furniture. Ma et al ma16tog again generate small areas of rooms by embellishing a focal object, but they decide the classes and locations of objects to add by reasoning over human actions that could take place in the scene.

Figure 2:

Cell structure for furniture layout. Thick black lines represent the walls enclosing the room, gray boxes represent furniture objects, dashed blue lines delimit cells and blue arrows indicate dimensions that expand/contract to fit their contents. There is one cell along each edge of the room, one at each corner, and for larger rooms, one in the interior. Other cell structures can capture prior knowledge about the functional layout of specific room types, e.g. a conversation area in a living room. Objects are padded with free space on each side (purple). Objects in cells that are larger than the sum of their contents, due to constraints due to neighboring cells, are distributed evenly (orange springs), with the exception that objects in corner/edge cells always remain flush with walls.

Summary and motivation: Although many algorithms have been proposed to generate furniture layouts, none of them is fully automatic, considers inter-object relationships, and is amenable to a diverse range of user-specified constraints (Table 1). Most automatic methods are limited to simple constraints such as room type, shape, size and the presence of a particular object. While other constraints could be applied by rejection sampling, this is prohibitively expensive, as these models are at best

slower than ours to sample from, and even slower for methods requiring no user interaction. Adding constraints to energy-based models is possible by introducing new potentials, but this requires careful design and engineering. Moreover, it would make sampling even more costly. Thus, it is not practical to incorporate complex constraints in any previous approaches to layout generation.

1.2 Contributions

We make the following contributions:

  1. We propose a generative model for room layouts, based on a directed graphical model that captures high-order relationships between objects;

  2. Our generative model enables efficient, ancestral sampling of objects and their attributes; and

  3. Our method is fully automatic and allows specification of general a priori constraints using rejection sampling.

2 Automatic generation of room layouts

We begin by sampling a room type (e.g. kitchen, living room), then sequentially sample furniture instances, conditioned on the room type and instances already sampled. We partition rooms into cells and sample objects and their positions within these cells (Fig. 2) such that geometric intersections will not occur. We begin with furniture objects (Sec. 2.2), which define the structure of the room. Then, in an embellishment step (Sec. 2.3), we sample instances of other categories, given the furniture items and their locations. The overall sampling process is illustrated in Fig. 1.

The parameters used for sampling instances of objects, for cell assignments and positioning within a cell are learned from the dataset (Sec. 3

). These are simple parametric models, which are human-interpretable and modifiable.

Figure 3: We introduce a set of categories (left) along with a manually-specified mapping from models to categories. The SUNCG database already specifies a mapping from models to object classes. We use both mappings.

2.1 Classes and categorization

The dataset contains CAD models, where each model is assigned an object class such as “television”, “bathtub”, “armchair”, etc. There is a total of 2500 CAD models and 170 object classes in the collection. We introduce an additional set of labels called categories: furniture, small objects such as books and laptops, wall objects such as picture frames, and ceiling objects such as lamps. We manually specified a second mapping from models to categories (Fig. 3). Our sampling strategy for models differs between the different categories, as discussed in the next sections.

2.2 Sampling furniture

We place furniture instances by sampling counts of singletons (i.e. individual objects), and of motifs and abutments, which are spatially-related groups learned from the training data. For each instance, we sample a cell it is to be placed in (e.g. ‘against the east wall’), and its orientation and padding. After all furniture counts and instance parameters have been sampled, we position the resulting objects deterministically. The room structure is finalized when cells (and hence the room) are sized to accommodate the objects and their paddings.

Singletons: We sample singletons using algorithm 1, where functions SampleNumInstances, SampleCell, SampleOrientation and SamplePadding sample from distributions whose parameters are learned from training data. The arguments to these functions signify what the underlying distributions are conditional on. At each iteration (line 6 of algorithm 1), cells expand to fit the sampled objects, ensuring no intersections between objects.

1: is the room type
2:
3:function SampleFurniture()
4:     for each object class  do
5:         
6:         for each model with  do
7:               SampleNumInstances(, , )
8:              for  do instances of
9:                   SampleCell()
10:                   SampleOrientation(, )
11:                   SamplePadding()
12:              end for
13:               accumulate count
14:         end for
15:     end for
16:end function
17:
18:for each CAD model :
19:    (i) number of instances that we place;
20:   (ii) parameters of each instance of .
Algorithm 1 Sampling singleton furniture instances

Motifs: Motifs are groups of items that are present together in many examples of the dataset, such as a table with chairs around it. We sample counts and instance parameters for each motif following lines 5–10 of algorithm 1, but in this case represents a motif rather than a singleton CAD model. Then, we set the relative offsets and orientations between items within a motif as observed in a randomly selected instance of the motif in the training database. This non-parametric strategy for determining relative placement yields coherent, visually-pleasing groups of furniture, and eliminates the need for expensive collision/intersection tests between meshes.

Abutments: Abutments are groups of items that appear in rows, abutting one another, with variations in their sequence, e.g. a row of cabinets in a kitchen along with a dishwasher, refrigerator and/or washing machine. Again, we sample counts and instance parameters for each abutment using lines 5–10 of algorithm 1, where

now represents a class of abutment. The furniture items within an abutment are modeled as a Markov chain with a terminal state; for each instance of the abutment, we sample from this Markov chain to obtain a specific sequence and number of CAD models. The transition probabilities of the Markov chains are learned during training.

2.3 Embellishment

Ceiling objects: Given a room type , we draw a single CAD model randomly according to a discrete probability mass function (pmf)  over models in this category. The number of instances of is determined using . This number is rounded up so that it can be factorized into a product of integers and the objects are positioned on a grid.

Wall objects: For each CAD model that is a wall object, we determine the number of instances in similar fashion to furniture in lines 2–5 of algorithm 1

. Each instance is then assigned to a wall uniformly randomly and its position on the wall is a combination of a Normal distribution along the

axis and Uniform distributions in

and . The parameters of the Normal distribution are learned (conditioned on ). If this results in a geometric conflict (intersection with other wall objects, doors, etc.) we reject the sampled location and repeat the process until there are no conflicts.

Small objects: For each furniture instance with CAD model , we sample small objects non-parametrically conditioned on and . We choose a random instance of in a room of type  from the database, and replicate the configuration of small objects associated with that instance.

2.4 Algorithm summary

To summarize, we first randomly sample the type of room  from a discrete distribution over 9 room types found in the database. The distribution (pmf) of  is learned during training. Then, we sample furniture items conditioned on : first singletons, then motifs and finally abutments. The numbers and instances of each item are determined by parameters learned during training. Once all furniture items are sampled, and assigned to cells, we use a deterministic placement algorithm that calculates their final positions in the room. Then, we sample ceiling objects and wall objects conditioned on  and the furniture placed. Finally, we sample small objects conditional on the furniture in the room and .

3 Training

Figure 4: Motif discovery. Left: We use DPMM clustering for K tuples of object occurrences and identify motifs as those tuples for which displacements from a base object in the tuple is within some threshold. Right: Examples of motifs that we automatically discover in SUNCG. Each colour corresponds to a different object in the pattern; we overlay 200 occurrences of each pattern to illustrate its variability. The red objects are the base objects of the patterns.

Dataset: We use a large dataset of ground-truth room layouts to learn parameters that are then used for automatically generating layouts. SUNCG [SYZ17] is a new dataset of 45000 apartment layouts, created by humans, and separately verified as reasonable by humans. Each apartment has an average of 8.1 rooms; the majority are annotated with the room type. The apartments are designed with an online tool, and contain objects of 170 classes, represented by around 2500 CAD models. There are 4.5M object instances; each consists of a reference to a CAD model, and its location and orientation in the room.

Number of instances: We model the number of instances of each CAD model as being conditional on the model , the room type , and on the number of furniture instances already sampled for the class . The distribution (pmf)  over count bins is calculated as a histogram (normalized) over all scenes in the database of type

. Further, a Poisson distribution is fitted to the observed

in all scenes with . SampleNumInstances (line 5 of algorithm 1) is implemented in two steps. First, we draw an indicator variable according to . If this variable is less than or equal to 4, then we return it as the number of instances. Otherwise, we return a sample from the Poisson distribution.

Instance attributes: For each model , during training, we calculate a pmf over 9 cells (4 corners, 4 edges and internal) by normalizing the histogram of occurrences. We implement SampleCell (line 7 of algorithm 1) by returning a cell according to this pmf. For models in internal cells, we count the number occurrences where they are aligned (positively or negatively) with respect to any axis and the number of “non-aligned” instances, and use this to learn a pmf. We implement SampleOrientation (line 8 of algorithm 1) by sampling an indicator variable according to the pmf for orientations. If this variable indicates non-alignment, we sample an orientation uniformly at random. Finally, we model padding around CAD models as a 4D diagonal-covariance Normal distribution conditioned on the CAD model . The dimensions correspond to padding on each side of the object: in-front-of, behind, to-the-left-of and to-the-right-of. SamplePadding (line 9 of algorithm 1) returns a sample from this 4D Normal distribution. The knowledge learnt during training is interpretable; Fig. 11b,c show values from the pmfs, indicating typical placements of objects, while Fig. 11a shows typical locations where we place various classes. In both cases, these agree well with human intuition on interior design.

Figure 5: Statistics of abutments observed in the training database are recorded in a transition matrix, along with a terminal state (blue circle). We synthesize abutments by generating Markov chains using the learned transition probabilities.

Motif discovery: We search the training set for all joint occurrences of a given -tuple of classes (e.g. table, chair, chair) within a room. For every occurrence of one of these classes – designated as the base object – we calculate displacements of the centres of the other objects relative to the base object. We model these displacements as points in a -dimensional space and cluster them with a Dirichlet process mixture model (DPMM) [Ras00], fitted by variational inference. We use Gaussian clusters with diagonal covariance and fit a DPMM per -tuple of classes. We calculate the area inside the contour for the location of each element in the motif; if all of these are less than a threshold, then the cluster is accepted as a motif. We store the CAD models, relative locations, and orientations for every occurrence assigned to the cluster; one of these will be selected when instantiating the pattern. Some examples of motifs we discover are given in Fig. 4.

Abutment discovery: We discover abutments in two stages. First, we gather sets of sequences of CAD models, where each set will ultimately become an abutment pattern. Each sequence of CAD models represents an abutting series of instances in the training set. Then, for each , we calculate the transition probabilities for the corresponding Markov chain, as maximum-likelihood estimates given the CAD model sequences . More precisely, we collect such sets in a collection , initialising to be empty. While doing so, we maintain the invariant that , and do not contain any sequences that share CAD models. For each room in the training set, we find all pairs of objects that abut, based on their rotated bounding-boxes touching at an edge. These pairs are combined transitively to form full sequences of objects , each being a row of abutting objects of some orientation. For each object-sequence , ignoring those with just two objects, we check if any of its CAD models appears in any sequence in a set already created. If so, we add the object-sequence to ; if not, we create a new one storing just , and add it to . In the first case, we also check that adding the sequence to has not broken the invariant that sets do not share CAD models; if it has, we merge sets until the invariant holds again. At the end of the above process, each contains many sequences of CAD models, each of which we will treat as a sample from the Markov chain . It is then straightforward to learn the transition probabilities for by maximising the likelihood of all the sequences . Some examples of abutments we discover are given in Fig. 5.

4 Constraints

Our generative model accommodates diverse constraints using rejection sampling as a general mechanism, i.e. we sample layouts until we obtain one that satisfies all constraints. We demonstrate the versatility of our generative model using some example constraints. Incorporating other constraints can be achieved similarly as long as a given layout can be verified to satisfy them. Since our sampling process is very fast (tens of milliseconds per room), any inefficiency due to rejection sampling is outweighed by its ability to serve as a common mechanism to impose a wide range of constraints (Tab. 2). In some special cases, we can avoid rejection sampling by allowing users to explicitly manipulate parameters of distributions learned (Fig. 9).

Room type and size: As the room type  has no ancestors in our model, it can directly be assigned a constrained value, avoiding rejection sampling entirely. Since room size is a continuous value, and the probability of any sample satisfying this is zero, we allow a small tolerance on each dimension (2% in all our examples).

Traversability: A layout is traversable if there exists a path between all points in free space (regions with no furniture) and from all points in free space to all doors in the room. To verify this, we first rasterise an orthographic projection of the furniture onto the floor plane at a fixed resolution and identify free space as the complement of this footprint. We calculate (areas where people can stand or pass) via morphological erosion of the free space using a circular kernel of radius 0.25m and also add regions on doors to . Similarly we calculate regions that require access using morphological erosion, but with a larger kernel. Then we verify traversability by checking whether is reachable from via some .

Gap placement and clearance: Ensuring there is a gap at a particular location allows users to augment layouts with their own 3D models, that are not part of our system. In order to make rejection sampling efficient, rather than just discarding layouts until one that satisfies the constraint is found, we directly place a ‘gap instance’ in a suitable cell, ensuring that no object will occupy the relevant space. Note that some rejections will still occur, as cell locations are not known precisely until all furniture items are placed. We also allow users to specify the clearance around particular objects, regardless of where in the room they are placed, e.g. to allow additional space next to a bed that must be used by a mobility-impaired individual. This is implemented efficiently by directly adjusting the parameters of the relevant padding distribution.

Object placement: We allow users to place instances of CAD models known to our system, at specific locations – e.g. a bed against a particular wall. Similarly to placing gaps, we ensure that a suitable instance is placed in the relevant cell, thereby greatly reducing the chances of rejection.

Doors, windows and refurnishing: We model door and window specification using a combination of gap-placement, at edges of rooms, and traversability (the relevant area is included in ). This allows us to support refurnishing existing rooms from SUNCG – that is, generating new furniture, while retaining the existing size/shape, doors, and windows. This is valuable for generating complete, realistic rooms without any user input.

5 Results

In this section, we present qualitative results to highlight the samples (with and without constraints) generated using our model, and quantitative results measuring performance. We also assess the quality of our generated layouts via a simple user study. All rendered images were obtained using path-tracing [Jak10]. The execution times reported in this paper were obtained using our unoptimized and sequential Python implementation, on an Intel Xeon E5-2620v3 processor, using less than 1GB of RAM.

Figure 6: Samples from our model, without constraints applied

5.1 Generating layouts

Unconstrained output: In Figure 6, we show some output examples from our generative model, without any constraints specified. Our model produce results without objects intersecting and mimics the diversity found in the training dataset – both in terms of the types of rooms as well as the objects in them. The co-occurrence and relative placements of objects are also realistic and natural. Unconstrained layout samples are generated in 0.04s on average.

Examples with constraints: Figure 7 shows examples of layouts where the room size and placement of one object was specified by the user. Note that the other sampled objects in the room are automatically chosen, and placed harmoniously. For example, in the first image, the constraint was “place a bed near the top right corner”. Our method automatically places nightstands on either side of the bed.

Figure 8 shows sample layouts where the shapes of the rooms and the locations of doors where specified as constraints. Note that the doors are unobstructed in the generated layouts.

Figure 9 shows examples of layouts where a user has specified particular clearances to be respected around specific object classes. The bar plots on the first column (solid blue) show the ranges of clearances learned during training on the left (L), front (F), right (R) and back (B) of the models sampled from four chosen classes (sofa, double bed, sink and bath). The pink squares on the bar plots depict user modifications of the learned parameter (dashed blue rectangles). For each specified constraint (rows), four sample outputs are visualized (columns), and the impact of the user specification is shown using pink arrows. In this particular example, the constraints are imposed by directly editing learned parameters rather than using rejection sampling, which leads to faster runtime.

Figure 7: Samples from our model, with constraints. The sizes of the rooms and the locations/classes of objects shown in pink boxes are constrained
Figure 8: Samples from our model, with constraints. The sizes of the rooms, their shapes and door locations are constrained.
Figure 9: Samples from our model, with user-specified clearance constraints. The left column shows the default (blue) and user-specified (pink) padding ranges in meters for each side (left/front/right/back) of the indicated object; the remaining columns show samples drawn from our model with the constraint applied, with the specified padding regions indicated
Figure 10: Samples from our model, applying constraints that are not satisfied by any layout in the training set. In each case, we constrain the room type, size, and placement of one object (indicated by a pink box), choosing a combination of constraints that is not satisfied by any layout in SUNCG. Our model is able to sample rooms fulfilling the constraints, despite not having seen such examples at training time.

Constraints producing uncharacteristic layouts: One benefit of training a constrainable generative model is that we can generate rooms fulfilling constraints that are never fulfilled in the training dataset, or only very rarely. We demonstrate this by using random sets of reasonable constraints and identifying those sets of constraints which are not jointly satisfied by any room in the SUNCG dataset. Then, we use our model to sample a room that does satisfy the constraints. Examples are given in Fig. 10.

Runtime with constraints: Table 2 shows the impact of applying different constraints on the running time of our method. Our single-threaded Python implementation takes just 0.04s to sample an unconstrained room layout. Even with complex constraints applied, the sampling time remains practical.

Constraint Time per sample /s
unconstrained 0.04
room type 0.04
object class exclusion 0.04
clearance 0.04
traversability 0.05
object placement 1.4
gap placement 1.8
room size 6.8
size + doors + windows 112
Table 2: Average time taken to sample a complete layout from our model, with different types of constraint applied. The timings are for an unoptimised Python implementation running on a single thread.

5.2 User study

We assessed the realism of layouts generated using our model via a user study comparing its output to human-designed rooms from the SUNCG database. We presented 1400 pairs of images to eight non-expert users and asked them to identify the image with a more realistic, or natural, layout of objects. In each case, one image was a ground-truth (human-designed) layout from SUNCG, and the other was a sample from our model; the order of the two images was randomised for each pair. The goal here is that our samples are indistinguishable from human-designed layouts, i.e. of equal perceived quality to them, so users prefer ours 50% of the time.

Viewpoint Ours pref.
overhead
1st person
(a)
Constraints Ours pref.
size + object
size + door
(b)
Table 3:

Percentage of image-pairs where users preferred (i.e. deemed more realistic) a layout sampled from our model, as opposed to a ground-truth layout from SUNCG (‘Ours pref.’). Higher is better, with 50% indicating that our samples are indistinguishable from ground-truth. Ranges are the 95% confidence interval 

[ET86]. (a) Unconstrained layouts; (b) Constrained layouts.

Unconstrained: We sampled several hundred random layouts from our model without constraints, and a similar number of ground-truth layouts from SUNCG. We presented images in the form of either overhead renderings or first-person camera views from inside the room. The observed user preferences are given in Table 2(a); we see that our layouts are equivalent in perceived quality to the human-designed layouts in the training set. Specifically, in first-person views, users slightly preferred our layouts; in overhead views, our layouts are indistinguishable from ground-truth up to statistical significance.

Constrained: We assessed room layouts generated by our model with constraints as we did layouts without constraints, but using only overhead renderings. We considered two representative settings for constrained generation: (i) fixing the room size and placement of one object; and (ii) fixing the room size and locations of doors and windows (implying gap placement and traversability constraints). In both cases, we generated several hundred random examples. For (i), we randomly generated arbitrary, but meaningful, pairs of constraints and sampled one layout fulfilling each. For (ii), we randomly selected rooms from SUNCG, and used their size and door/window locations as constraints for our model, again sampling one layout for each. In the second case, our model refurnishes existing rooms. In both cases, we compare our samples against arbitrary ground-truth rooms, which typically do not fulfill the same constraints, i.e.  we test the realism of our samples and not whether constraints are fulfilled (which is guaranteed by rejection). Results are given in Table 2(b). With room size and the placement of one object constrained, our layouts are indistinguishable from ground truth up to statistical significance. With room size and the positions of doors constrained, users preferred human-designed layouts.

Room type Ours pref.
bedroom
living room
Table 4: Comparison with Wang et al wang18tog: percentage of image-pairs where users preferred a layout sampled from our model, as opposed to one generated by wang18tog.

Comparison with [WSCR18]: We compared randomly generated unconstrained samples from our model with those generated by the state-of-the-art CNN-based method of [WSCR18]. We presented users with 550 pairs of images, each showing one layout generated by our method and one by that of [WSCR18]. We restricted the room types to those supported by both our method and theirs, i.e. bedroom and living room. The users were again asked to identify the image with a more realistic arrangement of objects. Results from this experiment are shown in Table 4. We see that our layouts are often preferred to those of [WSCR18]. Moreover, our model is interpretable and fast to train, whereas theirs is a non-interpretable black-box model trained over several days.

6 Discussion

Comparison with prior works: Probabilistic generative methods for room layouts are challenging to sample from. Often the sampling is not guaranteed to converge to a valid layout, espcially when many objects are present. e.g.  the model proposed by Handa et al handa16icra. This particular model also requires that the number of objects, and size of the room, be specified manually. Our model performs favorably compared with with the very recent work [WSCR18] that learns millions of parameters over days of training. For over 70% of pairs presented, users preferred our layouts to theirs. In addition to accommodating constraints easily, our model has another advantage in that the parameters learned are over semantically meaningful concepts (categories such as furniture) allowing direct modulation of learned parameters as shown in Figure 9. Although we manually specified padding constraints, they could be calculated from alternatives such as human-centric affordances [QZH18].

Inter-object relationships: We explicitly discover and encode relationships across classes of objects using patterns such as motifs and abutments. These patterns capture higher order relationships (not just pairwise); in the case of abutments, they are able to model sequences of variable lengths which may not be present in the training database. Figure 12 shows unnatural layouts generated when inter-object relationships due to motifs and abutments are not modeled. Additionally, implicit relationships are captured between different CAD models of the same class in a given layout, through the conditioning on the number of objects already placed (line 5 of algorithm 1). For example, the generative process may not favor a large item from a class if multiple small items from that class have already been sampled.

Novelty of samples: Large generative models run the risk of over-fitting their training set, memorizing the training data instead of generalizing to produce novel samples. Fig. 10 shows that our model is able to generate constrained layouts even when the constraints are not fulfilled by any room in the training set. Thus, it is not simply memorizing the training data, but is creating new arrangements. As a second demonstration of this, we directly searched for the most similar layouts in the training set, to layouts generated by our model. The retrieved layouts are typically quite different from our samples in arrangement and exact object counts, which confirms that our model is generalizing.

furniture
toilet 0.88
range oven 0.86
sink 0.83
chair 0.32
tripod 0.32
armchair 0.29
furniture
kitchen cabinet 0.99
dishwasher 0.99
single bed 0.99
office chair 0.74
armchair 0.65
tripod 0.49
(a) (b) (c)
Figure 11: Many parameters that are learned during training are human-interpretable. (a) Heat-maps showing locations where our model places different objects. Clockwise from top-left: shower, cabinet, sofa, double bed, dining table and toilet. (b) furniture classes with highest (top) and lowest (bottom) probability of being at the edge of a room rather than the interior (c) furniture classes highest (top) and lowest (bottom) probability of being at an angle that is a multiple of .

Efficient implementation of constraints: For many of the constraints listed in Section 4, rejection sampling can be avoided using alternative implementations. For example, space constraints may be tailored at the class level by biasing the 4D Normal distribution learned for padding. Figure 9 shows direct editability of learned parameters. Example layouts produced by the modified distribution are shown on the right, along with the effects of the user manipulation on the corresponding objects. Another example of a constraint that may be implemented efficiently is the specification of certain object classes (or CAD models) as not desirable. Rather than relying on rejection sampling, these classes (or models) may be pre-emptively avoided during instantiation.

Interpretability: Since our model learns parameters associated with semantically meaningful categories and positions, the learned results can be interpreted and manipulated. Figure 11a visualizes heat-maps of where the model places a few chosen object classes. For each class, we sampled 2500 rooms, and plotted (black dots) where objects of the indicated class were placed (normalizing the room bounds to a unit square). The model has learnt to place different classes meaningfully – for example, showers are almost always at the corner of a bathroom, dining tables are often at the center of a room, and toilets are always against a wall. Figure 11b and Figure 11c list the highest and lowest probability entries for object positioning and orientation. We obtained these numbers by averaging over the corresponding probabilities for all CAD models in the stated classes. The numbers align with our expectation that chairs and tripods are not typically placed along the edges of rooms, and that they are less likely to be aligned with the walls than beds, kitchen cabinets or dishwashers.

Figure 12: Samples from our model, but without motifs and abutments. Left: the kitchen cabinets and appliances are scattered, rather than placed adjacent to one another (as enabled by abutments). Centre: the chairs are scattered, rather than placed around the dining table (as enabled by motifs). Right: the two night-stands (lower left) are not at the expected location near the bed (as enabled by motifs)

Room shape: In our implementation we decompose non-rectangular rooms into rectangular regions and apply our cell structure on each region. Other strategies to partition rooms into cells may be adopted as long as they are kept consistent across training and sample generation. However, the choice of partitioning strategy may impact the quality of results.

Multiple, simultaneous constraints: Another advantage of rejection sampling as a general mechanism to impose constraints is that support for multiple constraints is trivial to implement. However this flexibility is accompanied by a penalty in terms of runtime. The time taken to generate a sample that satisfies multiple constraints, on average, is the product of the times taken to support each constraint.

Limitation: a posteriori editing: Our model is designed for fast generation of layouts with pre-specified constraints but our formulation does not facilitate editing an existing (previously sampled) layout. While we can handle a priori specification such as “I want a new layout such that this television set is placed against the east wall”, it cannot handle a posteriori editing such as “In the previously generated layout, move the television to the east wall”.

7 Conclusion

We have presented an efficient, probabilistic, data-driven, generative model for indoor furniture layouts. The algorithm used to generate layouts is simple and the parameters learned from training data are human-interpretable. We demonstrated that our model is able to accommodate a variety of constraints using rejection sampling as well as editing of learned parameters. We presented qualitative and quantitative results through rendered layouts, performance measurements and a user study. These showed that our layouts are realistic, and preferred by users over the state-of-the-art method of [WSCR18].

References

  • [CCPS13] Choi W., Chao Y.-W., Pantofaru C., Savarese S.: Understanding indoor scenes using 3D geometric phrases. In CVPR (2013).
  • [CGG17] Cordonnier G., Galin E., Gain J., Benes B., Guérin E., Peytavie A., Cani M.-P.: Authoring landscapes by combining ecosystem and terrain erosion simulation. ACM Trans. Graph. 36, 4 (2017).
  • [DPBF12] Del Pero L., Bowdish J., Fried D., Kermgard B., Hartley E., Barnard K.: Bayesian geometric modeling of indoor scenes. In CVPR (2012).
  • [DPBF13] Del Pero L., Bowdish J., Fried D., Kermgard B., Hartley E., Barnard K.: Understanding bayesian rooms using composite 3D object models. In CVPR (2013).
  • [DRC17] Dosovitskiy A., Ros G., Codevilla F., Lopez A., Koltun V.: CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning (2017), pp. 1–16.
  • [ET86] Efron B., Tibshirani R.:

    Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy.

    Statistical science (1986), 54–75.
  • [EVC15] Emilien A., Vimont U., Cani M.-P., Poulin P., Benes B.: Worldbrush: Interactive example-based synthesis of procedural virtual worlds. ACM Trans. Graph. 34, 4 (2015).
  • [FCW17] Fu Q., Chen X., Wang X., Wen S., Zhou B., Fu H.: Adaptive synthesis of indoor scenes via activity-associated object relation graphs. ACM Trans. Graph. 36, 6 (2017).
  • [FKS04] Funkhouser T., Kazhdan M., Shilane P., Min P., Kiefer W., Tal A., Rusinkiewicz S., Dobkin D.: Modeling by example. In ACM Transactions on Graphics (TOG) (2004), vol. 23, ACM, pp. 652–663.
  • [FRS12] Fisher M., Ritchie D., Savva M., Funkhouser T., Hanrahan P.: Example-based synthesis of 3d object arrangements. In SIGGRAPH (2012).
  • [HPSC16] Handa A., Patraucean V., Stent S., Cipolla R.:

    SceneNet: An annotated model generator for indoor scene understanding.

    In ICRA (2016).
  • [Jak10] Jakob W.: Mitsuba renderer, 2010. http://www.mitsuba-renderer.org.
  • [LXC17] Li J., Xu K., Chaudhuri S., Yumer E., Zhang H., Guibas L.:

    Grass: Generative recursive autoencoders for shape structures.

    ACM Transactions on Graphics (Proc. of SIGGRAPH 2017) 36, 4 (2017), to appear.
  • [LZM17] Liang Y., Zhang S.-H., Martin R. R.: Automatic Data-Driven Room Design Generation. In Next Generation Computer Animation Techniques (2017), Chang J., Zhang J. J., Magnenat Thalmann N., Hu S.-M., Tong R., Wang W., (Eds.), Springer International Publishing.
  • [MLZ16] Ma R., Li H., Zou C., Liao Z., Tong X., Zhang H.: Action-driven 3D indoor scene evolution. ACM TOG 35, 6 (2016), 173:1–173:13.
  • [MSK10] Merrell P., Schkufza E., Koltun V.: Computer-generated residential building layouts. ACM TOG 29, 6 (2010), 181:1–181:12.
  • [MSL11] Merrell P., Schkufza E., Li Z., Agrawala M., Koltun V.: Interactive furniture layout using interior design guidelines. In SIGGRAPH (2011).
  • [MVLS14] Ma C., Vining N., Lefebvre S., Sheffer A.: Game level layout from design specification. Computer Graphics Forum 33, 2 (2014), 95–104.
  • [MWH06] Müller P., Wonka P., Haegler S., Ulmer A., Van Gool L.: Procedural modeling of buildings. In ACM SIGGRAPH 2006 Papers (New York, NY, USA, 2006), SIGGRAPH ’06, ACM, pp. 614–623.
  • [NGDA16] Nishida G., Garcia-Dorado I., Aliaga D. G., Benes B., Bousseau A.: Interactive sketching of urban procedural models. ACM Trans. Graph. 35, 4 (2016).
  • [PM01] Parish Y. I. H., Müller P.: Procedural modeling of cities. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (New York, NY, USA, 2001), SIGGRAPH ’01, ACM, pp. 301–308.
  • [QZH18] Qi S., Zhu Y., Huang S., Jiang C., Zhu S.-C.: Human-centric indoor scene synthesis using stochastic grammar. In CVPR (2018).
  • [Ras00] Rasmussen C. E.:

    The infinite gaussian mixture model.

    In NIPS (2000).
  • [SKLTZ16] Sadeghipour Kermani Z., Liao Z., Tan P., Zhang H.: Learning 3D scene synthesis from annotated RGB-D images. Computer Graphics Forum 35, 5 (2016), 197–206.
  • [SYZ17] Song S., Yu F., Zeng A., Chang A. X., Savva M., Funkhouser T.: Semantic scene completion from a single depth image. In CVPR (2017).
  • [WSCR18] Wang K., Savva M., Chang A. X., Ritchie D.: Deep convolutional priors for indoor scene synthesis. ACM TOG (2018).
  • [YYT11] Yu L.-F., Yeung S.-K., Tang C.-K., Terzopoulos D., Chan T. F., Osher S. J.: Make it home: Automatic optimization of furniture arrangement. In SIGGRAPH (2011).
  • [ZZ11] Zhao Y., Zhu S.-C.: Image parsing with stochastic scene grammar. In NIPS (2011).
  • [ZZ13] Zhao Y., Zhu S.-C.: Scene parsing by integrating function, geometry and appearance models. In CVPR (2013).