GIMP-ML: Python Plugins for using Computer Vision Models in GIMP

This paper introduces GIMP-ML, a set of Python plugins for the widely popular GNU Image Manipulation Program (GIMP). It enables the use of recent advances in computer vision to the conventional image editing pipeline in an open-source setting. Applications from deep learning such as monocular depth estimation, semantic segmentation, mask generative adversarial networks, image super-resolution, de-noising and coloring have been incorporated with GIMP through Python-based plugins. Additionally, operations on images such as edge detection and color clustering have also been added. GIMP-ML relies on standard Python packages such as numpy, scikit-image, pillow, pytorch, open-cv, scipy. Apart from these, several image manipulation techniques using these plugins have been compiled and demonstrated in the YouTube playlist (https://www.youtube.com/playlist?list=PLo9r5wFmpD5dLWTyo6NOiD6BJjhfEOM5t) with the objective of demonstrating the use-cases for machine learning based image modification. In addition, GIMP-ML also aims to bring the benefits of using deep learning networks used for computer vision tasks to routine image processing workflows. The code and installation procedure for configuring these plugins is available at https://github.com/kritiksoman/GIMP-ML.

READ FULL TEXT VIEW PDF

Authors

page 2

page 4

11/21/2012

Mahotas: Open source software for scriptable computer vision

Mahotas is a computer vision library for Python. It contains traditional...
12/08/2020

River: machine learning for streaming data in Python

River is a machine learning library for dynamic data streams and continu...
06/04/2022

CVNets: High Performance Library for Computer Vision

We introduce CVNets, a high-performance open-source library for training...
06/19/2019

Artistic Enhancement and Style Transfer of Image Edges using Directional Pseudo-coloring

Computing the gradient of an image is a common step in computer vision p...
06/08/2018

Pricing Engine: Estimating Causal Impacts in Real World Business Settings

We introduce the Pricing Engine package to enable the use of Double ML e...
08/16/2018

Identifying Implementation Bugs in Machine Learning based Image Classifiers using Metamorphic Testing

We have recently witnessed tremendous success of Machine Learning (ML) i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image editing has conventionally been performed manually by users or graphics designers using various image processing tools or software. A plethora of image editing and transformation functions are provided in such tools, which are available in open-source, commercial or proprietary license-based modes. Image processing workflows have varying levels of complexity and sometimes even require significant effort from the user even for simple modifications to images.

GNU Image Manipulation Program (GIMP) is a popular free and open source image editing software that has been widely used on Linux-based platforms, as well as on other operating systems. It provides several features for image editing and manipulation and has a simple user interface to work with. It also supports the development of plugins which can be developed independently and integrated with the local GIMP installation on a computer. Using plugins, one can realize custom workflows or set of operations that can be applied to an image.

Recently, machine learning techniques have completely changed the landscape of image understanding and many applications which were previously not possible have now become the new baseline. This has significantly been facilitated by recent advances in deep learning and the applications of resultant models to tasks in the computer vision domain. However, these deep learning models have been made available to users using independent deep learning frameworks such as Keras, TensorFlow, PyTorch, among others. It may also be noted here that since these networks have a

“large” architecture, their training is done on compute-intensive platforms (using GPUs) and the resultant models have a high memory footprint. Since the use of these models requires the user to code, graphics designers and users involved in conventional image editing workflows using image processing tools have not often been able to directly leverage the benefits from the deep learning models. As such, developing a framework that would enable the use of deep learning models in image editing tasks through commonly available image processing tools would potentially benefit both the deep learning / computer vision community as well as graphics designers and common users of such software.

The motivation for this paper is to bridge the gap between cutting edge research in deep learning (computer vision) and manual image editing, specifically for the case of GIMP. A pilot implementation of plugins for GIMP, collectively termed as “GIMP-ML” (GIMP - Machine Learning), have been presented for various tasks such as background blurring, image coloring, face parsing, generative portrait modification, monocular depth based relighting, motion blurring and generating super-resolution images. It is expected that the image editing process would become highly automated in the upcoming future as the semantic understanding of images improves, which would be facilitated by advances in artificial intelligence.

Figure 1: GIMP-ML Plugins Menu

The rest of this paper is organized as follows. Section 2 presents they key dependencies for GIMP-ML. This is followed by implementation details in Section 3. Various applications of GIMP-ML have been illustrated in Section 4, which also includes links to demonstration videos on YouTube. Finally, conclusions and future work are presented in Section 5.

2 Dependencies

The Python package dependencies involved in the development of GIMP-ML are as follows:

  1. NumPy: The base N-dimensional array package, numpy [1]

    , has been used for converting GIMP layer to a tensor for use in Pytorch.

  2. SciPy: The fundamental library for scientific computing, scipy [2], has been used for performing basic computing operations.

  3. Scikit-image: The scikit-image [3] package has been used for realizing basic image processing operations for the plugins.

  4. OpenCV: The opencv-python[4] package provides OpenCV libraries in Python. It has been used for edge detection.

  5. Pre-Trained Models: The pretrainedmodels includes a set of pre-trained models for PyTorch [5], of which the InceptionResNetV2 has been used for the applications presented in this paper.

  6. Matplotlib: The matplotlib [6] is a Python package for generating multiple types of plots. The colormap available in this package and image normalization functions have been used in this work.

  7. Torch & Torchvision: The torch [5] and torchvision [7] packages have been used to incorporate the deep learning framework through Pytorch.

3 Implementation Details

The GIMP-ML plugins have been developed in Python 2.7 which is supported in GIMP 2.10. A virtual environment has been separately created and added to the gimp-python path. This contains all the python packages used by the plugins. The plugins use CPU by default and switch to GPU for prediction when available. Currently, for all plugins assume that the input layer should not have alpha channels. The plugins take advantage of layers in GIMP for various workflows. As a consequence, image manipulation in the following applications is also non-destructive in nature.

4 Applications

This section describes applications of GIMP-ML, which include background blurring, image coloring, face parsing, generative portrait modification, monocular depth based relighting, motion blurring and generating super-resolution images. Demo videos of all the applications are available in the YouTube playlist:
https://www.youtube.com/playlist?list=PLo9r5wFmpD5dLWTyo6NOiD6BJjhfEOM5t.

4.1 Background Blurring

We used the Pytorch Deeplabv3 [8] model trained on the Pascal VOC dataset [9]. It has 20 classes, namely, person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, and tv/monitor. These objects can be directly segmented in images. The segmentation map can then be used to selectively perform operations on regions of the image, such as blurring, hue/saturation change etc. A demonstration video for background blurring has been shown in https://youtu.be/U1CieWi--gc

4.2 Image Coloring

Conversion of grayscale images to RGB [10] using deep learning 111https://github.com/richzhang/colorization has also been included in GIMP-ML. The input image should be in grayscale model. This can be done from the menu Image->Mode->Grayscale. The demo for image coloring has been shown in https://youtu.be/HVwISLRow_0

4.3 Face Parsing

For segmenting portrait images, we used BiSeNet [11] trained on the CelebAMask-HQ dataset 222https://github.com/zllrunning/face-parsing.PyTorch. It can segment 19 classes such as such as skin, nose, eyes, eyebrows, ears, mouth, lip, hair, hat, eyeglass, earring, necklace, neck, and cloth. The segmentation map can then be used to selectively manipulate various facial features. Hair color manipulation has been demonstrated in the video demo using this network. The demo for hair color manipulation using this plugin can be viewed at https://youtu.be/thS8VqPvuhE

4.4 Generative Portrait Modification

With the facegen plugin, facial features in portrait photo can be segmented, modified and then newly generated. Trained on the CelebAMask-HQ dataset [12], this model 333https://github.com/switchablenorms/CelebAMask-HQ relies on facial segmentation map generated in the previous sub-section. The mask can be duplicated into another layer and it can be manipulated using Color Picker Tool and Paintbrush Tool. The input image, original mask and modified mask can then be fed into Mask-GAN to generate the desired image (as shown in Fig.2). A drawback of such a model is that it does not preserve unmodified facial features. This can, however, be taken care of by manually erasing unwanted facial feature changes from the generated layer thereby exposing the original image in the layer underneath. This is a valuable workflow since professional image editors spend a large amount of time in making portrait shots perfect and would retain original image facial features. The demo has generative portrait modification has been shown in https://youtu.be/kXYsWvOB4uk

Figure 2: Menu for Generative Portrait Modification

4.5 Monocular Depth based Relighting

Disparity maps can be generated from images using deep learning methods and depth from stereo images. Recently, self supervised monocular depth estimation has been proposed in [13]. This 444https://github.com/nianticlabs/monodepth2 has been ported for GIMP-ML using the model that was trained on the KITTI dataset [14].

Using this model, the disparity map of street images can be desaturated, inverted and colorized to created a layer representing light falling from the sky. In the demo video (https://youtu.be/q9Ny5XqIUKk), a day time image of a street has been converted to night time using this approach.

4.6 Motion Deblurring

GAN based motion deblurring from [15] was also ported 555https://github.com/TAMU-VITA/DeblurGANv2. The video demo has been shown in https://youtu.be/adgHtu4chyU.

4.7 Image Super resolution

The model in [16] for image super resolution 666https://github.com/twtygqyy/pytorch-SRResNet was also implemented. Using this plugin the input image layer can be upscaled to upto 4x its original size. Demo has been shown in https://youtu.be/HeBgWcXFQpI.

5 Conclusions and Future Work

This paper presented GIMP-ML, a set of Python plugins that enabled the use of deep learning models in GIMP via Pytorch for various applications. It has been shown that several manual and time-consuming image processing tasks can be simplified by the use of deep learning models, which makes it convenient for the users of image processing software to perform such tasks. GIMP 2.10 currently relies on Python 2.7 which been deprecated as on 1 January 2020. The next version of GIMP would use Python 3 and GIMP-ML codebase would be updated to support this. Further, deep learning models suffer from the data bias problem and only work well when the test image is from the same distribution as the data on which the model was trained. In future, the framework would be enhanced to handle such scenarios.

References

  • [1] Stéfan van der Walt, S Chris Colbert, and Gael Varoquaux. The numpy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2):22–30, 2011.
  • [2] Eric Jones, Travis Oliphant, and Pearu Peterson. Scipy: Open source scientific tools for python. 2001.
  • [3] Stefan Van der Walt, Johannes L Schönberger, Juan Nunez-Iglesias, François Boulogne, Joshua D Warner, Neil Yager, Emmanuelle Gouillart, and Tony Yu. scikit-image: image processing in python. PeerJ, 2:e453, 2014.
  • [4] Alexander Mordvintsev and K Abid. Opencv-python tutorials documentation. Obtenido de https://media. readthedocs. org/pdf/opencv-python-tutroals/latest/opencv-python-tutroals. pdf, 2014.
  • [5] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8024–8035, 2019.
  • [6] John D Hunter. Matplotlib: A 2d graphics environment. Computing in science & engineering, 9(3):90–95, 2007.
  • [7] Sébastien Marcel and Yann Rodriguez. Torchvision the machine-vision package of torch. In Proceedings of the 18th ACM international conference on Multimedia, pages 1485–1488, 2010.
  • [8] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation, 2017.
  • [9] Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.
  • [10] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution, 2016.
  • [11] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation, 2018.
  • [12] Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. Maskgan: towards diverse and interactive facial image manipulation, 2019.
  • [13] Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. Digging into self-supervised monocular depth estimation, 2019.
  • [14] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In

    2012 IEEE Conference on Computer Vision and Pattern Recognition

    , pages 3354–3361. IEEE, 2012.
  • [15] Orest Kupyn, Tetiana Martyniuk, Junru Wu, and Zhangyang Wang. Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better, 2019.
  • [16] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network, 2017.