Retail is a multi-trillion dollar business worldwide, with the global fashion industry valued at $3 trillion [FashionUnited (2016)]. Approximately $1.6 trillion of retail purchasing in 2015 was done via online e-commerce sales, with growth rates in the double digits [Lindner (2015)]. Thus, enabling better online apparel shopping experiences has the potential for enormous economic impact. Given the worldwide demand for fashion and the obvious impact of this demand on the apparel industry, technology-based solutions have recently been proposed, a few of which are already in use by commercial vendors. For example, there are several computer-aided design (CAD) software systems developed specifically for the apparel industry. The apparel CAD industry has focused predominantly on sized cloth-pattern development, pattern-design CAD software, 3D draping preview, and automatic generation of 2D clothing patterns from 3D body scanners or other measurement devices. Some of the leading apparel CAD companies include Gerber Technology, Lectra, Optitex, Assyst, StyleCAD, Marvelous Designer, Clo-3D and Avametric, etc. Unfortunately, developing these systems requires careful and lengthy design by an apparel expert.
More recently, there have been efforts to develop virtual try-on systems, such as triMirror, that allow users to visualize what a garment might look like on themselves before purchasing. These methods enable 3D visualization of various garments, fast animation of dynamic cloth, and a quick preview of how the cloth drapes on avatars as they move around. However, the capabilities of these new systems are limited, especially in terms of ease-of-use and applicability. Many of the virtual try-on systems use simple, fast image-based or texture-based techniques for a fixed number of avatar poses. They do not typically account for the effects on fabric materials of different conditions (e.g. changing weather, varying poses, weight fluctuation, etc). Furthermore, almost all of these virtual try-on systems assume either that the user selects one of a pre-defined set of avatars or that accurate measurements of their own bodies have been captured via 3D scans.
In this work, we consider the problem of recovering detailed models of garments from a single-view image. Such a capability enables users to virtually try on garments given only a single photograph of themselves wearing clothing. Instead of representing the clothed human as a single mesh [Chen et al. (2013), Li et al. (2012)], we define a separate mesh for a person’s clothing, allowing us to model the rich physical interactions between clothing and the human body. This approach also helps capture occluded wrinkles in clothing that are caused by various sources, including garment design that incorporates pleats, cloth material properties that influence the drape of the fabric, and the underlying human body pose and shape. Figure 1 illustrates some results generated by our system. In addition to virtual try-on applications, broader impacts in graphics include improving the accuracy of clothing models for animated characters, with the potential to further increase the visual realism of digital human models that already incorporate body-dependent priors for hair [Chai et al. (2012)], face [Cao et al. (2013)], skin [Nagano et al. (2015)], and eyeballs [Bérard et al. (2014)].
With limited input from a single-view image, we constrain the problem’s solution space by exploiting three important priors. The first prior is a statistical human body distribution model constructed from a (naked) human body data set. This statistical model is used for extracting and matching the human body shape and pose in a given input image. The second prior is a collection of all sewing patterns of various common garment types, such as skirts, pants, shorts, t-shirts, tank tops, and dresses, from a database of all garment templates. Finally, the third prior is a set of all possible configurations and dynamical states of garments governed by their respective constitutive laws and modeled by a physically-based cloth simulation. Simulation helps provide additional 3D physical constraints lacking in a 2D image.
Our method proceeds as follows. To construct an accurate body model, the user indicates 14 joint positions on the image and provides a rough sketch outlining the human body silhouette. (This step can also be automated using image processing and body templates for standard unoccluded poses.) From this information, we use a statistical human model to automatically generate a human body mesh for the image. To estimate the clothing model, we first compute a semantic parse of the garments in the image to identify and localize depicted clothing items. This semantic segmentation is computed automatically using a data-driven method for clothing recognition [Yamaguchi et al. (2013)]. We then use the semantic parsing to extract garment sizing information, such as waist girth, skirt length and so on, which are then used to map the depicted garments onto the existing garment templates and adjust the sewing patterns based on the extracted parameters. We also analyze the segmented garments to identify the location and density of wrinkles and folds in the recovered garments, which are necessary for estimating material properties of the garments for virtual try-on.
Once we have obtained both the body and clothing models, we perform an image-guided parameter identification process, which optimizes the garment template parameters based on the reconstructed human body and image information. We fit our 3D garment template’s surface mesh onto the human body to obtain the initial 3D garment, then jointly optimize the material parameters, the body shape, and the pose to obtain the final result. The flow chart of the overall process is shown in Fig. 2. Our main contributions include:
An image-guided garment parameter selection method that makes the generation of virtual garments with diverse styles and sizes a simple and natural task (Section 5);
A joint material-pose optimization framework that can reconstruct both body and cloth models with material properties from a single image (Section 6);
Application to virtual try-on and character animation (Section 7).
2 Related Work
Our work is built on previous efforts in cloth modeling, human pose/shape recovery, garment capture from single-view images, and semantic parsing.
Cloth Modeling: Cloth simulation is a traditional research problem in computer graphics. Early work on cloth simulation includes [Weil (1986), Ng and Grimsdale (1996), Baraff and Witkin (1998), House and Breen (2000)]. More recently, a number of methods were proposed to solve the complicated problems presented in cloth simulation, including collision detection [Govindaraju et al. (2007), Tang et al. (2009), Curtis et al. (2008)], collision handling, friction handling [Bridson et al. (2002)], strain limiting [Goldenthal et al. (2007), English and Bridson (2008), Thomaszewski et al. (2009), Wang et al. (2010)] and remeshing [Narain et al. (2012)].
Realistic wrinkle simulation is an important problem in realistic cloth modeling. Volino and Magnenat-Thalmann volino1999fast introduced a geometry-based wrinkle synthesis. Rohmer et al. rohmer2010animation presented a method to augment a coarse cloth mesh with wrinkles. Physically based cloth wrinkle simulation depends on an accurate model of the underlying constitutive law; different bending and stretching energy models for wrinkle simulation have been proposed [Bridson et al. (2003)].
Garment modeling is built upon cloth simulation. It also needs to take into consideration the design and sewing pattern of the garment. Some methods start from the 2D design pattern [Protopsaltou et al. (2002), Decaudin et al. (2006), Berthouzoz et al. (2013)] or 2D sketches [Turquin et al. (2007), Robson et al. (2011)]. Other methods explore garment resizing and transfer from 3D template garments [Wang et al. (2005), Meng et al. (2012), Sumner and Popović (2004)]. In contrast, our work synthesizes different ideas and extends these methods to process 2D input image and fluidly transfer the results to the simulation of 3D garments. We can also edit the 2D sewing patterns with information extracted from a single-view image, which can be used to guide the generation of garments of various sizes and styles.
Human Pose and Shape Recovering:
Human pose and shape recovery from a single-view image has been extensively studied in computer vision and computer graphics. Taylor taylor2000reconstruction presented an articulated-body skeleton recovery algorithm from a single-view image with limited user input. Agarwal et al. agarwal2006recovering proposed a learning-based method to recover human body poses from monocular images. Ye et al. ye2014real applied a template-based method for real-time single RGBD image human pose and shape estimation. We refer readers to this survey on human motion recovery and analysis[Moeslund et al. (2006)].
Human pose and shape recovery in computer graphics focus primarily on reconstructing muscle accurately and on watertight 3D human body meshes. A realistic 3D human body mesh is the basis for character animation. A human body mesh is required for the recovery of clothing with rich details. For human body mesh generation, we follow the previous data-driven methods, most of which are PCA based. These techniques use a set of bases to generate a variety of human bodies of different poses and shapes. Seo and Thalmann seo2003automatic presented a method to construct human body meshes of different shapes. Following this work, Anguelov et al. anguelov2005scape introduced the SCAPE model, which can produce human body meshes of different poses and shapes. Using the SCAPE model, Balan et al. balan2007detailed presented a method to recover detailed human shape and pose from images. Hasler et al. hasler2009statistical encode both human body shapes and poses using PCA and semantic parameters. Building upon these previous models, Zhou et al. zhou2010parametric proposed a method to recover the human body pose and shape from a single-view image.
Clothing Capturing: In the last decade, many methods have been proposed for capturing clothing from images or videos. Methods can be divided into two categories: marker-based and markerless. Most marker-based clothing capture methods require the markers to have been pre-printed on the surface of the cloth. Different kinds of markers have been used for the capture [Scholz and Magnor (2006), Hasler et al. (2006), Tanie et al. (2005), Scholz et al. (2005), White et al. (2007)]. Markerless methods, which do not require pre-printed clothing markers, can be characterized into several categories of methods: single-view [Zhou et al. (2013), Jeong et al. (2015)], depth camera based [Chen et al. (2015)]; and multi-view methods [Popa et al. (2009)]. Another method based on cloth contact friction design was proposed by Casati et. al. casati2016inverse.
These methods have some limitations, however, including inability to capture fine garment details and material properties, the loss of the original garment design, and complexity of the capturing process. In contrast, our method can retrieve the 2D design pattern with the individual measurements obtained directly from a single image. Using a joint human pose and clothing optimization method, our algorithm recovers realistic garment models with details (e.g. wrinkles and folds) and material properties.
Semantic Parsing: Semantic parsing is a well-studied problem in computer vision, where the goal is to assign a semantic label to every pixel in an image. Most prior work has focused on parsing general scene images [Long et al. (2015), Farabet et al. (2013), Pinheiro and Collobert (2014)]. We work on the somewhat more constrained problem of parsing clothing in an image. To obtain a semantic parse of the clothing depicted in an image, we make use of the data-driven approach by Yamaguchi et al. yamaguchi2013paper. This method automatically estimates the human body pose from a 2D image, extracts a visual representation of the clothing the person is wearing, and then visually matches the outfit to a large database of clothing items to compute a clothing parse of the query image.
3 Problem Statement and Assumptions
In this section, we give the formal definition of the problem. The input to our system is an RGB image . We assume the image is comprised of three parts: the background region , the foreground naked human body parts and the foreground garment , where . In addition, we assume that both the human body and the garment are in a statically stable physical state. Although this assumption precludes images capturing a fast moving human, it provides a crucial assumption for our joint optimization algorithm.
Problem: Given , , how to recover
– the garment described by a set of parameters ,
– along with a set of parameters that encode human body pose and shape obtained from the image.
Garment: For the clothing parameters, is the set of material parameters including stretching stiffness and bending stiffness coefficients; is the 2D triangle mesh representing the garment’s pattern pieces; and is the 3D triangle mesh representation of the garment. For each triangle of the 3D garment mesh , there is a corresponding one in the 2D space . For each mesh vertex , such as those lying on a stitching seam in the garment, there might be multiple corresponding 2D vertices . The parameter is the set of parameters that defines the dimensions of the 2D pattern pieces. We adopt the garment sizing parameters based on classic sewing patterns [Barnfield (2012)] shown in Fig. (a)a, (c)c and (e)e, with the corresponding parameters defined in Fig. (b)b, (d)d, and (f)f, respectively.
For example, we define the parameter for pants, where the first four parameters define the waist, bottom, knee and ankle girth and the last three parameters indicate the total length, back upper, and front upper length. For each basic garment category, we can manually define this set of parameters . By manipulating the values of the parameters , garments of different styles and sizes can be modeled: capri pants vs. full-length pants, or tight-fitting vs. loose and flowy silhouettes. We use the material model developed by Wang et al. wang2011data. The material parameters are the 18 bending and 24 stretching parameters.
Human Body: For the human body parameters, is the set of joint angles that together parameterize the body pose, and is the set of semantic parameters that describe the body shape. We follow the PCA encoding of the human body shape presented in [Hasler et al. (2009)]. The semantic parameters include gender, height, weight, muscle percentage, breast girth, waist girth, hip girth, thigh girth, calf girth, shoulder height, and leg length.
Table 3 provides a list of formal definitions for the notation used in this paper.
4 Data Preparation
This section describes the data preprocessing step. We begin with the data representations for the garment and the human body, followed by a brief description of each preprocessing module.
4.1 Data Representations
The garment template database can be represented as a set , where is the number of garment templates. Each garment template consists of a 2D triangle mesh representing the sewing pattern, a 3D mesh , a set of dimension parameters for each pattern piece, the skeleton , and a set of material parameters . The human body template database consists of naked human body meshes with point to point correspondence. We use several human body datasets, including the SCAPE dataset [Anguelov et al. (2005)], the SPRING dataset [Yang et al. (2014)], the TOSCA dataset [Bronstein et al. (2008), Young et al. (2007), Bronstein et al. (2006)], and the dataset from [Hasler et al. (2009)]. Our garment template is defined in the same metric system as the human template to scale the garments during the initial registration. Each garment template and human template is rigged on a common skeleton with the same set of joints.
Parameterized Garment Model: Given the garment template database , each vertex of the 2D garment pattern mesh is computed as
with as the 2D pattern sizing parameter in the set , is the weight associated with the vertex and is the vertex position of the 2D garment pattern template.
Parameterized Human Model: Given the body database , we extract a statistical shape model for human bodies. Under this model, each world space vertex position on the human body is parameterized as
which is a composition of a linear blend skinning model [Kavan et al. (2010)] and an active shape model [Zhao et al. (2003)]. Here and are the set of weights and bones associated with the vertex . is the transformation matrix of bone . and are the mean shape and active shape basis at the rest pose, respectively. The basis is calculated by running PCA [Hasler et al. (2009)] on .
Our preprocessing step consists of: a) human body reconstruction to recover the human body shape and pose from the input image, b) garment parsing to estimate the locations and types of garments depicted in the image, and c) parameter estimation to compute the sizing and fine features of the parsed garments.
Human Body Reconstruction: Our human body recovery relies on limited user input. The user helps us identify the 14 human body joints and the human body silhouette. With the identified joints, a human body skeleton is recovered using the method presented in [Taylor (2000)]: the semantic parameters are optimized to match the silhouette. In this step, we ignore the camera scaling factor.
Garment Parsing: We provide two options for garment parsing. The first uses the automatic computer vision technique presented in [Yamaguchi et al. (2013)]. This approach combines global pretrained parse models with local models learned from nearest neighbors and transferred parse masks to estimate the types of garments and their locations on the person. The second option requires assistance from the user. Given the image , we extract the clothing regions by performing a two-stage image segmentation guided by user sketch. In the first stage, a coarse region boundary is extracted using a graph cut algorithm [Li et al. (2004)]. Then, the region is refined via re-clustering [Levin et al. (2008)].
Image Information Extraction: Given the segmentation of the garment , the next step is to convert it to pixel-level garment silhouette and compute the average wrinkle density . Instead of using the wrinkle density for each part of the garment, the average wrinkle density encodes the overall material properties of it. We extract the average wrinkle density from the garment images using an improved implementation of [Popa et al. (2009)]. We first detect edges using Holistically-Nested edge detection [Xie and Tu (2015)] and then smooth the edges by fitting them to low-curvature quadrics. We smooth edges split during the detection phase by merging those that have nearby endpoints and similar orientations. Finally, we form 2D folds by matching parallel edges. Edges not part of a pair are unlikely to contribute to a fold and are discarded. The average number of wrinkles per area is the average wrinkle density .
4.3 Initial Garment Registration
Our initial clothing registration step aims to dress our template garment onto a human body mesh of any pose or shape. We optimize the vertex positions of the 3D mesh, , of the template clothing based on the human body mesh parameters . In this step, we ignore the fit of the clothing on the human body (this step is intended to fix the 2D triangle mesh ). We follow the method proposed in [Brouet et al. (2012)] for registering a template garment to a human body mesh with a different shape. However, their method is unable to fit the clothing to human meshes with varying poses; we extend their approach by adding two additional steps.
The first step requires the alignment of the joints of the template garment skeleton with the joints of the human body mesh skeleton, as shown in Fig. 4. Each joint of the garment has one corresponding joint of the human body mesh. We denote the number of joints of the garment as . This step is done by applying a rigid body transformation matrix on the joint of the garment, where minimizes the objective function
Next, we need to fit this transformed 3D template garment mesh onto the human body mesh with pose described by parameter
, the vector of the angles of the joints. Our template garment is then deformed according to. We denote the vector as the joint angles of the template garment mesh. We set the value of the vector to the value of the corresponding joint angle of the human body mesh. Then we compute the 3D garment template mesh such that it matches the pose of the underlying human body mesh according to this set of joint angles by,
where is the weight of bone on vertex and is the transformation matrix of the bone . An example result is shown in Fig. (c)c.
The final step is to remove collisions between the garment surface mesh and the human body mesh. This step is similar to the ICP algorithm proposed by Li et al. li2008global. We introduce two constraints: rigidity and non-interception. The deformation of the clothing should be as-rigid-as-possible [Igarashi et al. (2005)]. After this step, we have an initial registered garment with a 3D mesh that matches the underlying human pose and is free of interpenetrations with the human body. We show our initial garment registration results in Fig. 5.
5 Image-Guided Parameter Identification
In this section, we explain the step-by-step process of extracting garment material and sizing parameters from an image.
Starting from our 2D triangle mesh of the pattern pieces, we select garment parameters based on the sizing and detailed information estimated from the source image. In this step, we adjust the garment material and sizing parameters but fix the 3D mesh (computed from Sec. 4.3) to obtain the garment that best matches the one shown in the image. We need two specific pieces of information from the image: the pixel-level garment silhouette and the average wrinkle density of the clothing. For example, for a skirt, we need to estimate the waist girth and the length of the skirt from the image.
Using this information, we initialize the garment sizing parameters . Based on the wrinkle information computed from the image, we optimize both the fabric material parameters and the sizing parameters of the garment pattern .
5.2 Garment Types, Patterns, and Parameters
For basic garment types, such as skirts, pants, t-shirts, and tank tops, we use one template pattern for each. We modify the classic sewing pattern according to the parameters . By adjusting the garment parameters and fabric material parameters , we recover basic garments of different styles and sizes. The classic circle skirt sewing pattern is shown in Fig. (a)a. Our parametric space, which is morphed from this circle sewing pattern, is shown in Fig. (b)b. For the skirt pattern, there are four parameters to optimize: . The ratio between the parameter and is constrained by the waist girth and skirt length information extracted from the image. The other two parameters, and , are constrained by the wrinkle density. With different garment parameters, skirts can vary from long to short, stiff to soft, and can incorporate more or fewer pleats, enabling us to model a wide variety of skirts from a single design template.
Similarly for pants, the classic sewing pattern and our template pattern pieces are shown in Fig. (c)c and Fig. (d)d. There are seven parameters for the dimensions of the pants template: with the first four parameters describing the waist, bottom, knee, and ankle girth, and the last three parameters representing the total, back-upper and front-upper lengths. The t-shirt sewing pattern is shown in Fig. (e)e, and our parametric t-shirt pattern is shown in Fig. (f)f with the garment parameters . Among the parameters , parameter describes the neckline radius, describes the sleeve width, describes the shoulder width, describes the bottom length, describes the total length, and describes the length of the sleeve.
Different sewing patterns result in very different garments. Traditional sewing techniques form skirt wrinkles by cutting the circular portion of the pattern. To simulate this process but make it generally applicable, we modify the parameter , which achieves the same effect. In addition to the differences created by sewing patterns, professional garment designers also take advantage of cloth material properties to produce different styles of clothing. We tune the bending stiffness coefficients and stretching stiffness coefficients in to simulate this cloth selection process.
5.3 From Wrinkle Density to Material Property
One of the key insights in this work is the identification of fabric materials based on wrinkles and folds, because different fabric stiffness produce varying wrinkle/folding patterns. We characterize the wrinkles and folds using their local curvatures. The first step is to map the wrinkle density (computed in Sec. 4.2) to the average local curvature .
We recover the garment material parameter by minimizing the average local curvature differences between our recovered garment and the reference garment
The reference garment average local curvature is computed by linear interpolation.
We first approximate the average local curvature threshold for the sharp wrinkles and smooth folds.
The average local curvature threshold for one sharp wrinkle is up to
is computed by linear interpolation. We first approximate the average local curvature threshold for the sharp wrinkles and smooth folds. The average local curvature threshold for one sharp wrinkle is up toand that for smooth folds is close to . Sharp wrinkles or large folds are determined by the density of the extracted 2D wrinkles. The density of the extracted 2D wrinkles ranges from to based on our observation. The interpolation process (with the linear interpolation function ) is
with the linear interpolation function and . Local curvature estimation of at each vertex is computed based on the bending between the two neighboring triangles sharing the same edge. For each vertex of the two triangles that share an edge , the local curvature is computed following the approach from Wang et al. wang2011data and Bridson et al. bridson2003simulation
where and are the heights of the two triangles that share the edge and is the supplementary angle to the dihedral angle between the two triangles. The corresponding bending force for each vertex is computed as
where is the bending stiffness coefficient.
Stretching also affects the formation of wrinkles.
Each triangle in the 2D template mesh is represented as , and each triangle in the 3D garment mesh is represented as .
The stretching forces are computed by differentiating the stretching energy , which depends on the stretching stiffness parameter , the deformation gradient , and the Green strain tensor
, and the Green strain tensoragainst the vertex 3D position
The sizing and style of the garment described by the parameter obtained from the parsed garment are matched by minimizing the silhouette which is a 2D polygon differences between our recovered garment and the reference garment silhouette
The distance between two polygons is computed by summing up the distances between each point in polygon to the other polygon . To compute the 2D silhouette , we first project the simulated garment 3D mesh onto the the 2D image with the projection matrix . Then compute the 2D polygon enclosing the projected points. The process is expressed as
with as the method that convert the projected points to a 2D polygon. We ignore the camera scaling factor in this step since the input is a single-view image. It is natural to scale the recovered clothing and human body shape as a postprocessing step.
Combining these two objectives, the objective (energy) function is expressed as
5.4 Optimization-based Parameter Estimation
The optimization is an iterative process (given in Algorithm 1), alternating between updates for the garment sizing and material parameters, and . We found that the value of the objective function is more sensitive to the cloth material properties than to the garment parameter , so we maximize the iterations when optimizing for , fixing . The optimization of parameter is coupled with the cloth dynamics. The underlying cloth simulator is based on the method proposed in [Narain et al. (2012)]. We drape the initial fitted garment onto the human body mesh. The garment is in the dynamic state and subject to gravity. We couple our parameter estimation with this physically based simulation process. Before the simulation, we change the cloth material parameters so that when in static state the average of the local curvature matches the targeting threshold . That is to say, our optimizer minimizes by changing the bending stiffness parameters and stretching stiffness parameters .
We apply the L-BFGS [Liu and Nocedal (1989)] method for our material parameter optimization. When the clothing reaches a static state, the optimizer switches to optimizing parameter . The optimizer for the parameter
is not coupled with the garment simulation. The objective function is evaluated when the clothing reaches the static state. We adopt the Particle Swarm Optimization (PSO) method[Kennedy (2010)] for the parameter optimization. The PSO method is known to be able to recover from local minima, making it the ideal method for some of the non-convex optimization problems.
We use particles for the parameter estimation process. The alternating process usually converges after four steps. One example result of the garment parameter process is shown in Fig. 6.
We constrain the cloth material parameter space. We use the “Gray Interlock” presented in [Wang et al. (2011)], which is composed of cotton and polyester, as the “softest” material, meaning it bends the easiest. We multiply the bending parameters of this material by to give the “stiffest” material based on our experiments. Our solution space is constrained by these two materials, and we initialize our optimization with the “softest” material parameters.
6 Joint Material-Pose Optimization
6.1 Optimal Parameter Selection
The parameter identification step provides us with the initial recovered garment described by the set of material and sizing parameters . Many realistic garment wrinkles and folds, however, are formed due to the underlying pose of the human body, especially wrinkles that are located around human joints. Therefore, in this step, we further refine our results by optimizing both the pose parameters of the human body and the material properties of the cloth . The optimization objective for this step is
The optimization process (shown in Algorithm 2) is similar to the garment parameter identification step, alternating between updating the pose parameter and the material parameters . We use Particle Swarm Optimization method [Kennedy (2010)].
The objective function (Eqn. 13) is more sensitive to the pose parameter than to the material parameters . We constrain the optimization space of parameter by confining the rotation axis to only the three primal axes. An example of our joint material-pose optimization method is shown in Fig. 7.
6.2 Application to Image-Based Virtual Try-On
This joint material-pose optimization method can be applied directly to image-based virtual try-on. We first recover the pose and shape of the human body from the single-view image. Then we dress the recovered human body with the reconstructed garments from other images. We perform the initial garment registration step (Sec. 4.3) to fit the 3D surface mesh onto the recovered human body .
Existing state-of-the-art virtual try-on rooms require a depth camera for tracking, and overlay the human body with the fitting garment [Ye et al. (2014)]. Our algorithm, on the other hand, is able to fit the human body from a single 2D image with an optimized virtual outfit recovered from other images. We provide the optimized design pattern together with a 3D view of the garment fitted to the human body.
The fitting step requires iterative optimization in both the garment parameters and the human-body poses. As in a real fitting process, we vary the sizing of the outfits for human bodies of different sizes and shapes. When editing in parameter space using the methods introduced in the previous section, we ensure that the recovered garment can fit on the human body while minimizing the distortion of the original design. For each basic garment, we use one template pattern and the corresponding set of parameters. To preserve the garment design, we do not change the material properties of the fabric when virtually fitting the recovered garment to a new mannequin.
7 Results and Discussion
We have implemented our algorithm in C++ and demonstrated the effectiveness of our approach throughout the paper. In this section, we show example results, performance, and comparisons to other garment recovery methods.
7.1 Garment Recovery Results
We show several examples of garment recovery from a single-view image. In Fig. 8 and Fig. 9, we show that our method can recover garments of different styles and materials. Fig. 10 demonstrates the effectiveness of our method for the recovery of partially occluded garments. It also shows that our recovered garment can be applied to human bodies in different poses.
Image-Based Garment Virtual Try-On: We show examples of our image-based garment virtual try-on method (Sec. 6.2) in Fig. 1 and Fig. 11. We can effectively render new outfits onto people from only a single input image.
Evaluation: We evaluate the accuracy of the recovered sizing parameters and local curvature using synthetic scenes. Each synthetic scene has two lighting conditions, mid-day, and sunset (shown in Fig. 12). We fix the both the extrinsic and the intrinsic camera parameters for scene rendering, and the garments are in static equilibrium. Through these ten test cases, we can best validate the accuracy and reliability of our method against different body poses and lighting conditions on T-shirts and pants, as the sizing and material parameters are known exactly and do not require noisy measurements and/or data fitting to derive a set of estimated/measured parameters (which are not likely to be accurate) to serve as the ground truth. The evaluation result, after eliminating the camera scaling factor, is shown in Table 7.1.
We found that the lighting conditions mainly affect the body silhouette approximation and the garment folding parsing, while the body skeleton approximation is affected by the pose. Overall, we achieve an accuracy of up to for recovering the sizing parameters and for recovering the material parameters for t-shirts and pants under different body poses and lighting conditions. The accuracy is computed as the average accuracy for each parameter from the ground truth.
We evaluate the accuracy of the recovered material properties by measuring the difference between the ground truth and that of the recovered garment for both the mean curvature and the material parameters, as the accuracy of mean-curvature recovery also correlates with the accuracy of the material-parameter estimation. As shown in Table 7.1, we are able to achieve an accuracy of up to and , respectively, for the recovery of mean curvatures and different material parameters for the skirt.
7.2 Comparison with Other Related Work
We compare our results with the multi-view reconstruction method CMP-MVS [Jancosek and Pajdla (2011)] together with the structure-from-motion framework [Wu (2011), Wu (2013)]. For a fair comparison, we apply smoothing [Taubin (1995)] to the results of their work. Fig. 13 and Fig. 14 show that the garment recovered using our method is clean and comparable in visual quality to the recovered garments using multi-view methods. In addition, we are able to estimate the material properties from one single-view image for virtual try-on applications.
We further compare the results of our work against two recent methods – one using 3D depth information and an expert-designed 3D database [Chen et al. (2015)], and the other using a large database of manually labeled garment images [Jeong et al. (2015)]. Our method, which does not require depth information, an expert-designed 3D database, or a large manually labeled garment image database, achieves a comparable level of high accuracy to Chen et al. chen2015garment (see Fig. 15) and higher visual quality when compared with Jeong et al. jeong2015garment (see Fig. 16). In addition, our method is able to recover material and estimate sizing parameters directly from a given image.
We run our method on a desktop with an Intel(R) Core(TM) i7 CPU, 3.20GHz. For each garment, our pipeline takes on average 4 to 6 hours. The garment parameter identification (Sec. 5) and joint material-pose optimization (Sec. 6.1) takes around of the entire process. The preprocessing step (Sec. 4.2) takes around . The performance depends largely on the complexity of the garment, the image quality, and how much the garment is occluded.
Estimation of Material Parameters: Our material recovery method depends on the existence of wrinkles and folds of the garment. In cases where no or very few wrinkles or folds are present, other image features, such as textures and shading, would be required to identify the material properties. In most garments like shirts, skirts, or dresses, wrinkles and folds are common (especially around the joints or from the garment stylization), and can be highly informative with regards to garment material properties. Based on this observation, we are able to estimate material parameters as well as recover garment geometry from single-view images. This capability is one of our main objectives, and it is the key feature differentiating our work from existing techniques.
Accuracy of Geometry Reconstruction: In general, it is expected that recovery from single-view images should yield less accurate results than from standard 3D reconstruction and/or the most recent 3D multi-view methods. Our method adopts accurate physics-based cloth simulation to assist in the recovery process and achieves comparable visual quality, with a focus on capturing plausible wrinkles and folds, as well as material parameters required for virtual try-on using only photographs.
However, it is important to note that high visual quality does not always guarantee geometric accuracy in 3D garment recovery. At the same time, for some applications such as virtual try-on, rapid design, and prototyping, it is unclear if a high degree of geometric accuracy is required; it is also unknown how much error tolerance is needed for the comfortable fitting of garments. These are important considerations for further investigation in application to fashion design and e-commerce.
The current implementation of our approach depends on two databases: a database of commonly available garment templates and a database of human-body models.
The range of garments we can recover is, to some extent, limited by the available garment templates. Our parameter identification method can only generate garments that are “morphable” from the garment template, i.e. homeomorphic to the garment template. For example, since we use only one template for each garment type, we cannot yet model variations in some clothing details, e.g. multi-layered skirts, or collars on shirts. But for those garments that are not morphable from the template, our method can recover whichever version of the garment is closest to the actual garment. With a more extensive set of templates, we can begin to model more variations of styles and cuts, with richer garment details.
Another limitation is the human body shape recovery. Our reduced human body shape is described by a set of semantic parameters . The representation of this set of semantic parameters is not enormous, though it is sufficient to include most of the common human body shapes, as shown in our images. The known artifacts of linear human shape blending can also affect results. Aside from the human body shape recovery, our method is also limited by the state-of-art 3D human pose recovery methods. Manual intervention is needed when these methods fail to output a reasonably accurate 3D human pose.
8 Conclusion and Future Work
In this paper we present an algorithm for highly detailed garment recovery from a single-view image. Our approach recovers a 3D mesh of the garment together with the 2D design pattern, fine wrinkles and folds, and material parameters. The recovered garment can be re-targeted to other human bodies of different shapes, sizes, and poses for virtual try-on and character animation.
In addition to addressing some limitations mentioned above, there are many possible future research directions. First of all, we plan to develop a parallelized implementation of our system on GPU or a many-core CPU for fast garment recovery. Both the underlying cloth simulator and the optimization process can be significantly accelerated. We also plan to extend our approach to enable fabric material transfer from videos for interactive virtual try-on. Furthermore, we hope to explore possible perception metrics, similar in spirit to [Sigal et al. (2015)].
- Agarwal and Triggs (2006) Agarwal, A. and Triggs, B. 2006. Recovering 3d human pose from monocular images. Pattern Analysis and Machine Intelligence, IEEE Transactions on 28, 1, 44–58.
- AliExpress (2015) AliExpress. 2015. http://www.aliexpress.com.
- Anguelov et al. (2005) Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., and Davis, J. 2005. Scape: shape completion and animation of people. In ACM Transactions on Graphics (TOG). Vol. 24. ACM, 408–416.
- Anthropologie (2015) Anthropologie. 2015. http://www.anthropologie.com.
Balan et al. (2007)
Balan, A. O., Sigal, L., Black, M. J., Davis, J. E.,
and Haussecker, H. W. 2007.
Detailed human shape and pose from images.
2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–8.
- Baraff and Witkin (1998) Baraff, D. and Witkin, A. 1998. Large steps in cloth simulation. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques. ACM, 43–54.
- Barnfield (2012) Barnfield, J. 2012. The pattern making primer : all you need to know about designing, adapting and customizing sewing patterns. Barron’s Educational Series, Inc, Hauppauge, N.Y.
- Bérard et al. (2014) Bérard, P., Bradley, D., Nitti, M., Beeler, T., and Gross, M. 2014. High-quality capture of eyes. ACM Transactions on Graphics (TOG) 33, 6, 223.
- Berthouzoz et al. (2013) Berthouzoz, F., Garg, A., Kaufman, D. M., Grinspun, E., and Agrawala, M. 2013. Parsing sewing patterns into 3d garments. ACM Transactions on Graphics (TOG) 32, 4, 85.
- Boden (2015) Boden. 2015. http://www.bodenusa.com.
- Bridson et al. (2002) Bridson, R., Fedkiw, R., and Anderson, J. 2002. Robust treatment of collisions, contact and friction for cloth animation. In ACM Transactions on Graphics (ToG). Vol. 21. ACM, 594–603.
- Bridson et al. (2003) Bridson, R., Marino, S., and Fedkiw, R. 2003. Simulation of clothing with folds and wrinkles. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation. Eurographics Association, 28–36.
- Bronstein et al. (2006) Bronstein, A. M., Bronstein, M. M., and Kimmel, R. 2006. Efficient computation of isometry-invariant distances between surfaces. SIAM Journal on Scientific Computing 28, 5, 1812–1836.
- Bronstein et al. (2008) Bronstein, A. M., Bronstein, M. M., and Kimmel, R. 2008. Numerical geometry of non-rigid shapes. Springer Science & Business Media.
- Brouet et al. (2012) Brouet, R., Sheffer, A., Boissieux, L., and Cani, M.-P. 2012. Design preserving garment transfer. ACM Trans. Graph. 31, 4, 36.
- Cao et al. (2013) Cao, C., Weng, Y., Lin, S., and Zhou, K. 2013. 3d shape regression for real-time facial animation. ACM Transactions on Graphics (TOG) 32, 4, 41.
- Casati et al. (2016) Casati, R., Daviet, G., and Bertails-Descoubes, F. 2016. Inverse elastic cloth design with contact and friction. Ph.D. thesis, Inria Grenoble Rhône-Alpes, Université de Grenoble.
- Chai et al. (2012) Chai, M., Wang, L., Weng, Y., Yu, Y., Guo, B., and Zhou, K. 2012. Single-view hair modeling for portrait manipulation. ACM Trans. Graph. 31, 4 (July), 116:1–116:8.
- Chen et al. (2013) Chen, X., Guo, Y., Zhou, B., and Zhao, Q. 2013. Deformable model for estimating clothed and naked human shapes from a single image. The Visual Computer 29, 11, 1187–1196.
- Chen et al. (2015) Chen, X., Zhou, B., Lu, F., Wang, L., Bi, L., and Tan, P. 2015. Garment modeling with a depth camera. ACM Transactions on Graphics (TOG) 34, 6, 203.
- Curtis et al. (2008) Curtis, S., Tamstorf, R., and Manocha, D. 2008. Fast collision detection for deformable models using representative-triangles. In Proceedings of the 2008 symposium on Interactive 3D graphics and games. ACM, 61–69.
- Decaudin et al. (2006) Decaudin, P., Julius, D., Wither, J., Boissieux, L., Sheffer, A., and Cani, M.-P. 2006. Virtual garments: A fully geometric approach for clothing design. In Computer Graphics Forum. Vol. 25. Wiley Online Library, 625–634.
- English and Bridson (2008) English, E. and Bridson, R. 2008. Animating developable surfaces using nonconforming elements. In ACM Transactions on Graphics (TOG). Vol. 27. ACM, 66.
- Farabet et al. (2013) Farabet, C., Couprie, C., Najman, L., and LeCun, Y. 2013. Learning hierarchical features for scene labeling. In Pattern Analysis and Machine Intelligence.
- FashionableShoes (2013) FashionableShoes. 2013. http://bestfashionableshoess.blogspot.com.
- FashionUnited (2016) FashionUnited. 2016. Global fashion industry statistics - international apparel.
- Goldenthal et al. (2007) Goldenthal, R., Harmon, D., Fattal, R., Bercovier, M., and Grinspun, E. 2007. Efficient simulation of inextensible cloth. ACM Transactions on Graphics (TOG) 26, 3, 49.
- Govindaraju et al. (2007) Govindaraju, N. K., Kabul, I., Lin, M. C., and Manocha, D. 2007. Fast continuous collision detection among deformable models using graphics processors. Computers & Graphics 31, 1, 5–14.
- Hasler et al. (2006) Hasler, N., Asbach, M., Rosenhahn, B., Ohm, J.-R., and Seidel, H.-P. 2006. Physically based tracking of cloth. In Proc. of the International Workshop on Vision, Modeling, and Visualization, VMV. 49–56.
- Hasler et al. (2009) Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., and Seidel, H.-P. 2009. A statistical model of human pose and body shape. In Computer Graphics Forum. Vol. 28. Wiley Online Library, 337–346.
- Hillsweddingdress (2015) Hillsweddingdress. 2015. http://hillsweddingdress.xyz.
- House and Breen (2000) House, D. H. and Breen, D. E. 2000. Cloth modeling and animation. AK Peters.
- Igarashi et al. (2005) Igarashi, T., Moscovich, T., and Hughes, J. F. 2005. As-rigid-as-possible shape manipulation. In ACM transactions on Graphics (TOG). Vol. 24. ACM, 1134–1141.
- Jancosek and Pajdla (2011) Jancosek, M. and Pajdla, T. 2011. Multi-view reconstruction preserving weakly-supported surfaces. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. 3121–3128.
- Jeong et al. (2015) Jeong, M.-H., Han, D.-H., and Ko, H.-S. 2015. Garment capture from a photograph. Computer Animation and Virtual Worlds 26, 3-4, 291–300.
- Kavan et al. (2010) Kavan, L., Sloan, P.-P., and O’Sullivan, C. 2010. Fast and efficient skinning of animated meshes. In Computer Graphics Forum. Vol. 29. Wiley Online Library, 327–336.
Kennedy, J. 2010.
Particle swarm optimization.
Encyclopedia of Machine Learning. Springer, 760–766.
- Levin et al. (2008) Levin, A., Lischinski, D., and Weiss, Y. 2008. A closed-form solution to natural image matting. Pattern Analysis and Machine Intelligence, IEEE Transactions on 30, 2, 228–242.
- Li et al. (2012) Li, H., Luo, L., Vlasic, D., Peers, P., Popović, J., Pauly, M., and Rusinkiewicz, S. 2012. Temporally coherent completion of dynamic shapes. ACM Transactions on Graphics 31, 1 (January).
- Li et al. (2008) Li, H., Sumner, R. W., and Pauly, M. 2008. Global correspondence optimization for non-rigid registration of depth scans. In Computer graphics forum. Vol. 27. Wiley Online Library, 1421–1430.
- Li et al. (2004) Li, Y., Sun, J., Tang, C.-K., and Shum, H.-Y. 2004. Lazy snapping. In ACM Transactions on Graphics (ToG). Vol. 23. ACM, 303–308.
- Lindner (2015) Lindner, M. 2015. Global e-commerce sales set to grow 25% in 2015. https://www.internetretailer.com/2015/07/29/global-e-commerce-set-grow-25-2015.
- Liu and Nocedal (1989) Liu, D. C. and Nocedal, J. 1989. On the limited memory bfgs method for large scale optimization. Mathematical programming 45, 1-3, 503–528.
- Long et al. (2015) Long, J., Shelhamer, E., and Darrell, T. 2015. Fully convolutional networks for semantic segmentation. CVPR (to appear).
- Meng et al. (2012) Meng, Y., Wang, C. C., and Jin, X. 2012. Flexible shape control for automatic resizing of apparel products. Computer-Aided Design 44, 1, 68–76.
- ModCloth (2015) ModCloth. 2015. http://www.modcloth.com.
- Moeslund et al. (2006) Moeslund, T. B., Hilton, A., and Krüger, V. 2006. A survey of advances in vision-based human motion capture and analysis. Computer vision and image understanding 104, 2, 90–126.
- Nagano et al. (2015) Nagano, K., Fyffe, G., Alexander, O., Barbiç, J., Li, H., Ghosh, A., and Debevec, P. 2015. Skin microstructure deformation with displacement map convolution. ACM Trans. Graph. 34, 4 (July), 109:1–109:10.
- Narain et al. (2012) Narain, R., Samii, A., and O’Brien, J. F. 2012. Adaptive anisotropic remeshing for cloth simulation. ACM Transactions on Graphics (TOG) 31, 6, 152.
- Ng and Grimsdale (1996) Ng, H. N. and Grimsdale, R. L. 1996. Computer graphics techniques for modeling cloth. Computer Graphics and Applications, IEEE 16, 5, 28–41.
Pinheiro, P. H. and Collobert, R. 2014.
Recurrent convolutional neural networks for scene labeling.In ICML.
- Popa et al. (2009) Popa, T., Zhou, Q., Bradley, D., Kraevoy, V., Fu, H., Sheffer, A., and Heidrich, W. 2009. Wrinkling captured garments using space-time data-driven deformation. In Computer Graphics Forum. Vol. 28. Wiley Online Library, 427–435.
- Protopsaltou et al. (2002) Protopsaltou, D., Luible, C., Arevalo, M., and Magnenat-Thalmann, N. 2002. A body and garment creation method for an Internet based virtual fitting room. Springer.
- RedBubble (2015) RedBubble. 2015. http://www.redbubble.com.
- Robson et al. (2011) Robson, C., Maharik, R., Sheffer, A., and Carr, N. 2011. Context-aware garment modeling from sketches. Computers & Graphics 35, 3, 604–613.
- Rohmer et al. (2010) Rohmer, D., Popa, T., Cani, M.-P., Hahmann, S., and Sheffer, A. 2010. Animation wrinkling: augmenting coarse cloth simulations with realistic-looking wrinkles. In ACM Transactions on Graphics (TOG). Vol. 29. ACM, 157.
- Saaclothes (2015) Saaclothes. 2015. http://www.saaclothes.com.
- Scholz and Magnor (2006) Scholz, V. and Magnor, M. 2006. Texture replacement of garments in monocular video sequences. In Proceedings of the 17th Eurographics conference on Rendering Techniques. Eurographics Association, 305–312.
- Scholz et al. (2005) Scholz, V., Stich, T., Keckeisen, M., Wacker, M., and Magnor, M. 2005. Garment motion capture using color-coded patterns. In Computer Graphics Forum. Vol. 24. Wiley Online Library, 439–447.
- Seo and Magnenat-Thalmann (2003) Seo, H. and Magnenat-Thalmann, N. 2003. An automatic modeling of human bodies from sizing parameters. In Proceedings of the 2003 symposium on Interactive 3D graphics. ACM, 19–26.
- Sigal et al. (2015) Sigal, L., Mahler, M., Diaz, S., McIntosh, K., Carter, E., Richards, T., and Hodgins, J. 2015. A perceptual control space for garment simulation. ACM Transactions on Graphics (TOG) 34, 4, 117.
- Sumner and Popović (2004) Sumner, R. W. and Popović, J. 2004. Deformation transfer for triangle meshes. ACM Transactions on Graphics (TOG) 23, 3, 399–405.
- Tang et al. (2009) Tang, M., Curtis, S., Yoon, S.-E., and Manocha, D. 2009. Iccd: Interactive continuous collision detection between deformable models using connectivity-based culling. Visualization and Computer Graphics, IEEE Transactions on 15, 4, 544–557.
- Tanie et al. (2005) Tanie, H., Yamane, K., and Nakamura, Y. 2005. High marker density motion capture by retroreflective mesh suit. In Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on. IEEE, 2884–2889.
- Taubin (1995) Taubin, G. 1995. A signal processing approach to fair surface design. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques. SIGGRAPH ’95. ACM, New York, NY, USA, 351–358.
- Taylor (2000) Taylor, C. J. 2000. Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on. Vol. 1. IEEE, 677–684.
- Thomaszewski et al. (2009) Thomaszewski, B., Pabst, S., and Strasser, W. 2009. Continuum-based strain limiting. In Computer Graphics Forum. Vol. 28. Wiley Online Library, 569–576.
- Turquin et al. (2007) Turquin, E., Wither, J., Boissieux, L., Cani, M.-P., and Hughes, J. F. 2007. A sketch-based interface for clothing virtual characters. IEEE Computer Graphics and Applications 1, 72–81.
- Volino and Magnenat-Thalmann (1999) Volino, P. and Magnenat-Thalmann, N. 1999. Fast geometrical wrinkles on animated surfaces. In Seventh International Conference in Central Europe on Computer Graphics and Visualization (Winter School on Computer Graphics).
- Wang et al. (2005) Wang, C. C., Wang, Y., and Yuen, M. M. 2005. Design automation for customized apparel products. Computer-Aided Design 37, 7, 675–691.
- Wang et al. (2010) Wang, H., O’Brien, J., and Ramamoorthi, R. 2010. Multi-resolution isotropic strain limiting. In ACM Transactions on Graphics (TOG). Vol. 29. ACM, 156.
- Wang et al. (2011) Wang, H., O’Brien, J. F., and Ramamoorthi, R. 2011. Data-driven elastic models for cloth: modeling and measurement. ACM Transactions on Graphics (TOG) 30, 4, 71.
- Weil (1986) Weil, J. 1986. The synthesis of cloth objects. ACM Siggraph Computer Graphics 20, 4, 49–54.
- White et al. (2007) White, R., Crane, K., and Forsyth, D. A. 2007. Capturing and animating occluded cloth. In ACM Transactions on Graphics (TOG). Vol. 26. ACM, 34.
- Wu (2011) Wu, C. 2011. Visualsfm: A visual structure from motion system. URL: http://homes. cs. washington. edu/~ ccwu/vsfm 9.
- Wu (2013) Wu, C. 2013. Towards linear-time incremental structure from motion. In 3D Vision - 3DV 2013, 2013 International Conference on. 127–134.
- Xie and Tu (2015) Xie, S. and Tu, Z. 2015. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision. 1395–1403.
- Yamaguchi et al. (2013) Yamaguchi, K., Kiapour, M. H., and Berg, T. 2013. Paper doll parsing: Retrieving similar styles to parse clothing items. In Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 3519–3526.
- Yang et al. (2014) Yang, Y., Yu, Y., Zhou, Y., Du, S., Davis, J., and Yang, R. 2014. Semantic parametric reshaping of human body models. In 3D Vision (3DV), 2014 2nd International Conference on. Vol. 2. IEEE, 41–48.
- Ye et al. (2014) Ye, M., Wang, H., Deng, N., Yang, X., and Yang, R. 2014. Real-time human pose and shape estimation for virtual try-on using a single commodity depth camera. IEEE transactions on visualization and computer graphics 20, 4, 550–559.
- Young et al. (2007) Young, S., Adelstein, B., and Ellis, S. 2007. Calculus of nonrigid surfaces for geometry and texture manipulation. Visualization and Computer Graphics, IEEE Transactions on 13, 5, 902–913.
- Zhao et al. (2003) Zhao, W., Chellappa, R., Phillips, P. J., and Rosenfeld, A. 2003. Face recognition: A literature survey. ACM computing surveys (CSUR) 35, 4, 399–458.
- Zhou et al. (2013) Zhou, B., Chen, X., Fu, Q., Guo, K., and Tan, P. 2013. Garment modeling from a single image. In Computer Graphics Forum. Vol. 32. Wiley Online Library, 85–91.
- Zhou et al. (2010) Zhou, S., Fu, H., Liu, L., Cohen-Or, D., and Han, X. 2010. Parametric reshaping of human bodies in images. In ACM Transactions on Graphics (TOG). Vol. 29. ACM, 126.