Recent years have witnessed many breakthroughs in AI (He et al., 2016; LeCun et al., 2015; Silver et al., 2016), especially computer vision (Krizhevsky et al., 2012), speech recognition (Amodei et al., 2016)et al., 2017). Deep learning models have surpassed human on many fields, such as image recognition (He et al., 2015) and skin cancer diagnosis (Esteva et al., 2017)
. Face recognition has been widely used among smart phones (such as iPhone X FaceID111https://support.apple.com/en-us/HT208109) and security entrance. Recommendation system (such as Alibaba, Amazon and ByteDance) helps people easily find information they want. Visual search system allows us to easily get products by just taking a picture with cellphone (Zhang et al., 2018; Yang et al., 2017).
However, building an effective AI system is quite challenging (Sculley et al., 2015). Firstly, the developers should collect, clean and annotate raw data to ensure a satisfactory performance, which is quite time-consuming and takes lots of money and energy. Secondly, experts in machine learning should formulate the problems and develop corresponding computational models. Thirdly, computer programmars should train models, fine-tune hyper-parameters, and develop SDK or API for later usage. Bad case analysis is also required if the performance of baseline model is far from satifaction. Last but not least, the above procedure should be iterated again and again to meet the rapid change of requirements (see Figure 1). The whole development procedure may fail if any step mentioned above fails.
Facing so many difficulties, cloud services (such as Amazon Web Service (AWS) 222https://aws.amazon.com/, Google Cloud 333https://cloud.google.com/, AliYun 444https://www.aliyun.com/ and Baidu Yun 555https://cloud.baidu.com/) are getting increasingly popular among market. Nevertheless, these platforms are developed for commercial production. Researchers only have limited access to existing APIs, and cannot know the inner design architecture of the systems. So it is difficult for researchers to bridge the gap between research models and production applications.
Aiming at solving problems mentioned above. In this paper, we construct an AI cloud platform termed EXtensive Cloud (XCloud) with common recognition abilities for both research and production fields. XCloud is freely accessible and open-sourced on github 666https://github.com/lucasxlu/XCloud.git to help researchers build production application with their proposed models.
In this section, we will give a detailed description about the design and implementation of XCloud. XCloud
XCloud is composed of 4 modules, namely, computer vision (CV), data mining (DM) and research (R). We will briefly introduce the following services by module.
2.1.1. Computer Vision
In CV module, we implement and train serveral models to solve the following common vision problems.
Plants recognition is popular among plant enthusiasts and botanists. It can be treated as a fine-grained visual classification problem, since a bunch of samples of different categories have quite similar appearance. We train ResNet18 (He et al., 2016) to recognize over 998 plants.
Plant disease recognition can provide efficient and effective tools in intelligent agriculture. Farmers can know disease category and take relevant measures to avoid huge loss. ResNet50 (He et al., 2016) is trained to recognize over 60 plant diseases.
Face analysis model can predict serveral facial attributes from a given portrait image. We take HMTNet (Xu et al., 2019a) as computational backbone model. HMTNet is a multi-task deep model with fully convolutional architecture, which can predict facial beauty score, gender and race simultaneously from a unique model. Details can be found from (Xu et al., 2019a).
Food recognition is popular among health-diet keepers and is widely used in New Ratailing fields. DenseNet169 (Huang et al., 2017) is adopted to train food recognition model.
Skin lesion analysis gains increased attention in medical AI areas. We train DenseNet121 (Huang et al., 2017) to recognize 198 common skin diseases.
Pornography image recognition models provide helpful tools to filter sensitive images on Internet. We also integrate this feature into XCloud. We train DenseNet121 (Huang et al., 2017) to recognize pornography images.
Garbage Classification has been a hot topic in China recently 888http://www.xinhuanet.com/english/2019-07/03/c_138195992.htm
, it is an environment-friendly behavior. However, the majority of the people cannot tell different garbage apart. By leveraging computer vision and image recognition technology, we can easily classify diverse garbage. The dataset is collected from HUAWEI Cloud999https://developer.huaweicloud.com/competition/competitions/1000007620/introduction. We split 20% of the images as test set, and the remaining as training set. We train ResNet152 (He et al., 2016) with 90.12% accuracy on this dataset.
2.1.2. Data Mining
In data mining module, we provide useful toolkit (Xu et al., 2019b) related to an emerging research topic–online knowledge quality evaluation (like Zhihu Live 101010https://www.zhihu.com/lives/). This API will automatically calculate Zhihu Live’s score within a range of 0 to 5, which can provide useful information for customers.
In this module, we provide the source code for training and test machine learning models mentioned above. Researchers can use the code provided to train their own models. Furthermore, we also reimplement several models (such as image quality assessment (Kang et al., 2014; Bosse et al., 2016; Talebi and Milanfar, 2018; Kang et al., 2015), facial beauty analysis (Xu et al., 2018, 2019a)et al., 2017; Wen et al., 2016), etc.) in computer vision, which makes it easy for users to integrate these features into XCloud APIs.
2.2. Performance Metric
The performance of the above models are listed in Table 1. We adopt accuracy as the performance metric to evaluate classification services (such as plant recognition, plant disease recognition, food recognition, skin lesion analysis and pornography image recognition), and Pearson Correlation (PC) is utilized as the metric in facial beauty prediction task. Mean Absolute Error (MAE) is adopted as the metric in ZhihuLive quality evaluation task.
where and represent predicted score and groundtruth score, respectively. denotes the number of data samples. and stand for the mean of and , respectively. A larger PC value represents better performance of the computational model.
|Plant Recognition||ResNet18 (He et al., 2016)||FGVC5 Flowers 111111https://sites.google.com/view/fgvc5/competitions/fgvcx/flowers||Acc=0.8909||Plant category and confidence|
|Plant Disease Recognition||ResNet50 (He et al., 2016)||PDD2018 Challenge 121212https://challenger.ai/dataset/pdd2018||Acc=0.8700||Plant disease category and confidence|
|Face Analysis||HMTNet (Xu et al., 2019a)||SCUT-FBP5500 (Liang et al., 2018)||PC=0.8783||Facial beauty score within|
|Food Recognition||DenseNet161 (Huang et al., 2017)||iFood 131313https://sites.google.com/view/fgvc5/competitions/fgvcx/ifood||Acc=0.6689||Food category and confidence|
|Garbage Classification||ResNet152 (He et al., 2016)||HUAWEI Cloud||Acc=0.9012||Garbage category and confidence|
|Insect Pet Recognition||DenseNet121 (Huang et al., 2017)||IP102 (Wu et al., 2019)||Acc=0.6106||Insect pet category and confidence|
|Skin Disease Recognition||DenseNet121 (Huang et al., 2017)||SD198 (Sun et al., 2016)||Acc=0.6455||Skin disease category and confidence|
|Porn Image Recognition||DenseNet121 (Huang et al., 2017)||nsfw_data_scraper 141414https://github.com/alexkimxyz/nsfw_data_scraper.git||Acc=0.9313||Image category and confidence|
|Zhihu Live Rating||MTNet (Xu et al., 2019b)||ZhihuLiveDB (Xu et al., 2019b)||MAE=0.2250||Zhihu Live score within|
2.3. Design of RESTful API
Encapsulating RESTful APIs is regarded as standard in building cloud platform. With RESTful APIs, related services can be easily integrated into terminal devices such as PC web, WeChat mini program, android/iOS APPs, and HTML5, without considering compatibility problems. The RESTful APIs provided are listed in Table 2.
|cv/mcloud/skin||skin disease recognition||POST||imgraw/imgurl|
|cv/fbp||facial beauty prediction||POST||imgraw/imgurl|
|cv/nsfw||pornography image recognition||POST||imgraw/imgurl|
|cv/pdr||plant disease recognition||POST||imgraw/imgurl|
|dm/zhihuliveeval||Zhihu Live rating||GET||Zhihu Live ID|
2.4. Backend Support
The backend of XCloud is developed based on Django 151515https://www.djangoproject.com/. We follow the MVC (Leff and Rayfield, 2001) design pattern which represents that the view, controller and model are separately developed and can be easily extended in later development work. In order to record user information produced on XCloud, we construct 2 relational tables in MySQL which is listed in Table 3 and Table 4, to store relevant information.
In addition, we also provide simple and easy-to-use script to convert original PyTorch models to TensorRT 161616https://developer.nvidia.com/tensorrt models for faster inference. TensorRT is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. With TensorRT, we are able to run DenseNet169 (Huang et al., 2017) with 97.63 FPS on two 2080TI GPUs, which is significantly faster than its counterpart PyTorch naive inference engine (29.45 FPS).
As shown by the name of XCloud (EXtensive Cloud), it is also quite easy to integrate new abilities. Apart from using existing AI technology provided by XCloud, developers can also easily build their own AI applications by referring to the model training code contained in research module 171717https://github.com/lucasxlu/XCloud/tree/master/research. Hence, the developers only need to prepare and clean dataset. After training your own models, your AI interface is automatically integrated into XCloud by just writing a new controller class and adding a new Django view.
2.6. API Stress Testing
The performance and stability play key roles in production-level service. In order to ensure the stability of XCloud, Nginx 181818http://nginx.org/ is adopted for load balancing. In addition, we use JMeter 191919https://jmeter.apache.org/ to test all APIs provided by XCloud. The results of stress testing can be found in Table 5.
|API||AVG_LATENCY (ms)||P99 (ms)||ERROR|
From Table 5 we can conclude that the performance and stability of XCloud are quite satisfactory under current software and hardware condition. We believe the performance could be heavily improved if stronger hardware is provided. The test environment with 2080TI GPUs and Intel XEON CPU is enough to support 20 QPS (query per second). By deploying XCloud on your machine and running server, you will get the homepage as Figure 4.
3. Conclusion and Future Work
In this paper, we construct an AI cloud platform with high performance and stability which provides common AI service in form of RESTful API, to ease the development of AI projects. In our future work, we will integrate more service into XCloud and develop better models with advanced performance.
-  (2016) Deep speech 2: end-to-end speech recognition in english and mandarin. In International conference on machine learning, pp. 173–182. Cited by: §1.
A deep neural network for image quality assessment. In 2016 IEEE International Conference on Image Processing (ICIP), pp. 3773–3777. Cited by: §2.1.3.
-  (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 (7639), pp. 115. Cited by: §1.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. Cited by: §1.
Deep residual learning for image recognition.
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §1, 1st item, 2nd item, 7th item, Table 1, §2.
-  (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: 4th item, 5th item, 6th item, 8th item, §2.4, Table 1, §2.
Google’s multilingual neural machine translation system: enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5, pp. 339–351. Cited by: §1.
-  (2014) Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1733–1740. Cited by: §2.1.3.
Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks. In 2015 IEEE international conference on image processing (ICIP), pp. 2791–2795. Cited by: §2.1.3.
-  (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.
-  (2015) Deep learning. nature 521 (7553), pp. 436. Cited by: §1.
-  (2001) Web-application development using the model/view/controller design pattern. In Proceedings fifth ieee international enterprise distributed object computing conference, pp. 118–127. Cited by: §2.4.
-  (2018) SCUT-fbp5500: a diverse benchmark dataset for multi-paradigm facial beauty prediction. In 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1598–1603. Cited by: Table 1.
-  (2017) SphereFace: deep hypersphere embedding for face recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.1.3.
-  (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pp. 8024–8035. Cited by: §2.
-  (2015) Hidden technical debt in machine learning systems. In Advances in neural information processing systems, pp. 2503–2511. Cited by: §1.
-  (2016) Mastering the game of go with deep neural networks and tree search. nature 529 (7587), pp. 484. Cited by: §1.
-  (2016) A benchmark for automatic visual classification of clinical skin disease images. In European Conference on Computer Vision, pp. 206–222. Cited by: Table 1.
-  (2018) Nima: neural image assessment. IEEE Transactions on Image Processing 27 (8), pp. 3998–4011. Cited by: §2.1.3.
-  (2016) A discriminative feature learning approach for deep face recognition. In European conference on computer vision, pp. 499–515. Cited by: §2.1.3.
-  (2019) IP102: a large-scale benchmark dataset for insect pest recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8787–8796. Cited by: 8th item, Table 1.
-  (2019) Hierarchical multi-task network for race, gender and facial attractiveness recognition. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 3861–3865. Cited by: 3rd item, §2.1.3, Table 1, §2.
-  (2019) Data-driven approach for quality evaluation on knowledge sharing platform. arXiv preprint arXiv:1903.00384. Cited by: §2.1.2, Table 1, §2.
-  (2018) CRNet: classification and regression neural network for facial beauty prediction. In Pacific Rim Conference on Multimedia, pp. 661–671. Cited by: §2.1.3, §2.
-  (2017) Visual search at ebay. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2101–2110. Cited by: §1.
-  (2018) Visual search at alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 993–1001. Cited by: §1.