Log In Sign Up

Glasgow's Stereo Image Database of Garments

by   Gerardo Aragon-Camarasa, et al.

To provide insight into cloth perception and manipulation with an active binocular robotic vision system, we compiled a database of 80 stereo-pair colour images with corresponding horizontal and vertical disparity maps and mask annotations, for 3D garment point cloud rendering has been created and released. The stereo-image garment database is part of research conducted under the EU-FP7 Clothes Perception and Manipulation (CloPeMa) project and belongs to a wider database collection released through CloPeMa ( This database is based on 16 different off-the-shelve garments. Each garment has been imaged in five different pose configurations on the project's binocular robot head. A full copy of the database is made available for scientific research only at


page 3

page 4


Co-Teaching: An Ark to Unsupervised Stereo Matching

Stereo matching is a key component of autonomous driving perception. Rec...

Stereo Vision Based Single-Shot 6D Object Pose Estimation for Bin-Picking by a Robot Manipulator

We propose a fast and accurate method of 6D object pose estimation for b...

A New Stereo Benchmarking Dataset for Satellite Images

In order to facilitate further research in stereo reconstruction with mu...

ARDOP: A Versatile Humanoid Robotic Research Platform

This paper describes the development of a humanoid robot called ARDOP. T...

360^∘ Stereo Image Composition with Depth Adaption

360^∘ images and videos have become an economic and popular way to provi...

Multiview Stereo with Cascaded Epipolar RAFT

We address multiview stereo (MVS), an important 3D vision task that reco...

Code Repositories


University of Glasgow GPU stereo matcher

view repo

1 Introduction

The CloPeMa project is advancing the state of the art in clothes perception and manipulation by delivering a novel robotic system that accomplishes automatic sorting and folding of a laundry heap. To this end, CloPeMa is using a prototype robot composed mainly of off-the-shelf components comprising an active binocular vision robot head. This active binocular robot head, which is inspired by the system developed in Aragon-Camarasa et al. (2010)

, has been designed by the Computer Vision and Graphics Group (CV&G) at the University of Glasgow. This robotic head, as created for CloPeMa, is not only able to provide high-resolution intensity images of the robot’s workspace, as required for intensity based computer vision algorithms, but is capable of automatic vergence and gaze control, hand eye calibration and 2.5D reconstruction of areas-of-interest. Data captured by this robotic head can be used in a wide variety of applications such garment spreading and flattening

Sun et al. (2013), automatic visual inspection and exploration of cluttered scenesAragon-Camarasa et al. (2010)

, selection of better grasping points or more detailed feature extraction and classification. In order to provide a first insight into the type and quality of data produced by the binocular robot head in the CloPeMa robot system, we have compiled and released a freely available database of stereo-pair images of garments. The aim of this dataset is to serve as a benchmark tool for algorithms for recognition, segmentation and various range image properties of non-rigid objects. For instance, it will be used to improve the Vector-Pascal

Cockshott et al. (2012) Glasgow parallel stereo matcher and its GPU implementations. This dataset is the first high resolution stereo-pair garment image dataset that is released for research purposes and potentially allows for a variety of research applications. Therefore, the Glasgow’s Stereo Image Database of Garments can be downloaded from:

This database comprises images of 16 different off-the-shelve garments selected from the official CloPeMa cloth heap, defined in Molfino et al. (2012). The CloPeMa heap features a wide variety of textile materials with different texture, colour and reflectance characteristics in order to give a realistic sample of the real world clothing variety. For the released database, the chosen garments where imaged in five possible pose configurations: flat on the table, folded in half, completely folded, randomly wrinkled and hanging over the robot’s arm. These configurations are an approximation of the most representative pose configurations a robot may encounter while sorting and folding clothes. Each of the selected five configurations were imaged under software-control capture synchronisation. The database therefore yields a total of 80 stereo-pairs of garment images. For completeness, the horizontal and vertical disparities without mask of the Glasgow’s Stereo Image Database of Garments can also be downloaded.

The 80 stereo-pairs in the database have all been processed using the Glasgow stereo matcher Cyganek and Siebert (2011), in order to compute the horizontal and vertical disparities. A new version of the Glasgow stereo matcher has been integrated in CloPeMa’s robot system as a ROS node within the CloPeMa robot head package collection111 Additionally, the data-set’s image pairs are accompanied by mask annotations for the left as well as the right image. The camera calibration, which has been computed using CloPeMa’s integrated OpenCV compatible robot head calibration system, is also released as part of the database. This enables the research community to use the database for 3D garment point cloud projection. Specifically for this purpose, Matlab-based reconstruction software is also distributed within this database.

It must be noted that the above algorithms and methods have been integrated as part of a collection of ROS nodes distributed in the official CloPeMa package collection. Specifically, the CloPeMa active robot head system software includes ROS nodes for directing the robot’s gaze under program control, automatic vergence, acquiring synchronised stereo-pair images, camera and hand-eye calibration routines, stereo image processing algorithms (including a GPU stereo matcher based on the Glasgow Stereo Matcher) , real-time SIFT feature extraction and user interactive interfaces for gaze control and calibration routines. The robot head ROS packages can be downloaded from:

2 Database Acquisition

The CloPeMa robotic test-bed is equipped with two Yaskawa robotic arms mounted on a computer controllable tailor-made Yaskawa turn table, two RGB-D sensors for wide vision mounted on the wrists of the robotic arms, two prototype grippers designed by the University of Genoa Le et al. (2013) and an active binocular robot head for foveated vision designed by the University of Glasgow. Figure 1 shows the University of Glasgow robotic infrastructure. The database subject of this report has been captured using the active binocular robot head. This robot head comprises two Nikon DSLR cameras (D5100) that are capable of capturing images at 16 Mega Pixels at different zoom settings (manually selected, 35mm used for this database). These are mounted on two pan and tilt units (PTU-D46) with their corresponding controllers as depicted in Figure 2. The cameras are separated by a pre-defined baseline for optimal stereo capturing within the robot’s workplace. The baseline separation between cameras is 30 centimetres.

Figure 1: CloPeMa test-bed at the University of Glasgow.
Figure 2: CloPeMa robot head.

Garments were placed on a planar surface at an average distance of 1.8 meters from the binocular robot head. For each garment, five different garment pose configurations were captured as showed in Figure 3. Figure 4 shows an example of the 35-mm zoom setting. The cameras of the robotic head were converged at the centre point in the left and right cameras prior capturing the stereo images. For this purpose, the vergence algorithm reported in Aragon-Camarasa et al. (2010) was used and integrated as a ROS node as described in Section 1.

(a) Spread (b) Half-way folded
(c) Folded (d) Wrinkled
(e) Hanging
Figure 3: Garment states captured.
Figure 4: Garment zoom setting at 35-mm. A standard Nikon 8-55mm VR lens was used for capturing the database.

For each image, a manually segmented mask of the same image resolution has been provided for annotation purposes. In the database creation, the mask was applied as part of the vertical and horizontal disparities computation using the Glasgow stereo matcher Cyganek and Siebert (2011). Gimp 2.8222 was used to segment and annotate the stereo-pair images.

The underlying objective of the stereo matching algorithm is to locate for each pixel in one image of a stereo pair, the corresponding location on the other image of the pair. The correspondence problem is solved by constructing a displacement field (also termed parallax or disparity map) that maps points in the left image to the corresponding location on the right image. These displacement fields are expressed in terms of two disparity maps for storing horizontal and vertical displacements mapping pixels in the left image to the corresponding location in the right image. Computed disparities can then be used to reconstruct highly detailed point clouds and/or range images. Range image preview examples can be depicted in Figure 5. It should be noted that point clouds and range images are not included in the database as the file size of each stereo-pair sample is roughly in the order of 1GB; however, source code to recover the 3D geometry from the disparity maps is included in the database.

Figure 5: Examples of range images computed at different zoom settings.

3 Database File Description and Organisation

The database ( is firstly divided according to the captured garments. These are organised and stored in folders using a numeric index from 1 to 16. In each of these folders, garment pose configurations are organised in folders which follow the the following file format: XX_S; where XX denotes the garment class and the folder number where the image is stored and S, the garment pose configurations. S can take the following classification indices which correspond to how the garment was captured:

  • 0 - Cloth is spread on the table (Figure 3(a)).

  • 1 - Cloth is half-way folded (Figure 3(b)).

  • 2 - Cloth is completely folded (Figure 3(c)).

  • 3 - Cloth is wrinkled (Figure 3(d)).

  • 4 - The robot is holding the cloth in the air and close to the table (Figure 3(e)).

Within the above folders, the following is stored (it can also be depicted in Figure 6):

  • Stereo-pair images (left and right camera images) are stored as 16Mpixel colour TIFF image files (4928 x 3264 x 24 BPP).

  • Annotated image masks for the stereo-pair are stored as black and white TIFF files, i.e. (4928 x 3264 x 8 BPP).

  • Horizontal (dispMH) and vertical (dispMV) disparity maps and a confidence matching map (dispMConfidence) are stored as text files, in ASCII format, as matrices of 4928 by 3264 floating point values. These maps are compressed as 7zip format.

  • A JPEG compressed preview of the garment range image.

Figure 6: Example of the file organisation of the stereo database.

Camera calibration parameters are stored as XML files for each of the captured garments. Calibration files are saved as calL.xml and calR.xml for the left and right cameras, respectively, as showed in Figure 6. These XML files can be easily read using OpenCV I/O XML functions. The companion source code provides an example on how to load these calibration files. Calibration parameters in each file include:

  • Camera matrix, , as a 3 by 3 matrix that stores the focal point and principal point in pixels.

  • Distortion coefficients, , as a 1 by 4 vector. The Glasgow stereo matcher and stereo reconstruction does not use this information; however, this coefficients are included for completeness.

  • Projection matrix, , as a 3 by 4 matrix. This matrix is defined for the left (Equation 1) and right (Equation 2) cameras as follows:



    is a 3 by 3 identity matrix,

    and , the rotation and translation matrices that transforms the right camera reference frame into the left camera reference frame. and are used to recover the 3D structure of the captured scene.

  • Fundamental matrix, , as a 3 by 3 matrix that relates corresponding points between the stereo-pair. The same numeric matrix is defined in both files.


We would like to thank the European Community’s Seventh Framework Programme (FP7/2007-2013) to support this research work under grant agreement no 288553, CloPeMa.



  • Aragon-Camarasa et al. (2010) Aragon-Camarasa, G., Fattah, H., Siebert, J. P., Mar. 2010. Towards a unified visual framework in a binocular active robot vision system. Robotics and Autonomous Systems 58 (3), 276–286.
  • Cockshott et al. (2012) Cockshott, W., Oehler, S., Camarasa, G. A., Siebert, J., Xu, T., 2012. A parallel stereo vision algorithm. In: Many-Core Applications Research Community Symposium 2012.
  • Cyganek and Siebert (2011) Cyganek, B., Siebert, J. P., 2011. An introduction to 3D computer vision techniques and algorithms. Wiley.
  • Le et al. (2013) Le, T.-H.-L., Jilich, M., Landini, A., Zoppi, M., Zlatanov, D., Molfino, R., 2013. On the development of a specialized flexible gripper for garment handling. Journal of automation and control engineering 1 (3), 255–259.
  • Molfino et al. (2012) Molfino, R., Zoppi, M., Jilich, M., Hong Loan, L. T., Cannata, G., Maiolino, P., Denei, S., Malassiotis, S., Triantafilou, D., Gorpas, D., Hlavac, V., Donner, M., Aragon-Camarasa, G., Siebert, J. P., 2012. D1.1 scenarios and detailed specification of m12 demonstration. Tech. rep., EU-FP7 Clothes Percpetion and Manipulation (CloPeMa) project under grant agreement no. 288553.
  • Sun et al. (2013)

    Sun, L., Aragon-Camarasa, G., Cockshott, P., Rogers, S., Siebert, J., August 2013. A heuristic-based approach for flattening wrinkled clothes. In: Towards Autonomous Robotic Systems, TAROS 2013 (in press). LNCS Springer.