In recent years, we have witnessed the success of numerous data-driven machine-learning-based applications. This has prompted the database community to investigate the opportunities for integrating machine learning techniques in the design of database systems and applications. A branch of machine learning, called deep learning [44, 29]
, has attracted worldwide interest in recent years due to its excellent performance in multiple areas including speech recognition, image classification and natural language processing (NLP). The foundation of deep learning was established about twenty years ago in the form of neural networks. Its recent resurgence is mainly fueled by three factors: immense computing power, which reduces the time to train and deploy new models, e.g. Graphic Processing Unit (GPU) enables the training systems to run much faster than those in the 1990s; massive (labeled) training datasets (e.g. ImageNet) enable a more comprehensive knowledge of the domain to be acquired; new deep learning models (e.g. AlexNet) improve the ability to capture data regularities.
Database researchers have been working on system optimization and large scale data-driven applications since 1970s, which are closely related to the first two factors. It is natural to think about the relationships between databases and deep learning. First, are there any insights that the database community can offer to deep learning? It has been shown that larger training datasets and a deeper model structure improve the accuracy of deep learning models. However, the side effect is that the training becomes more costly. Approaches have been proposed to accelerate the training speed from both the system perspective [9, 32, 15, 62, 1] and the theory perspective [90, 20]. Since the database community has rich experience with system optimization, it would be opportune to discuss the applicability of database techniques for optimizing deep learning systems. For example, distributed computing and memory management are key database technologies also central to deep learning.
Second, are there any deep learning techniques that can be adapted for database problems? Deep learning emerged from the machine learning and computer vision communities. It has been successfully applied to other domains, like NLP. However, few studies have been conducted using deep learning techniques for traditional database problems. This is partially because traditional database problems — like indexing, transaction and storage management — involve less uncertainty, whereas deep learning is good at predicting over uncertain events. Nevertheless, there are problems in databases like knowledge fusion  and crowdsourcing , which are probabilistic problems. It is possible to apply deep learning techniques in these areas. We will discuss specific problems like querying interface, knowledge fusion, etc. in this paper.
The previous version  of this paper has appeared in SIGMOD Record. In this version, we extend it to include the recent developments in this field and references to recent work.
The rest of this paper is organized as follows: Section 2 provides background information about deep learning models and training algorithms; Section 3 discusses the application of database techniques for optimizing deep learning systems. Section 4 describes research problems in databases where deep learning techniques may help to improve performance. Some final thoughts are presented in Section 5.
Deep learning refers to a set of machine learning models which try to learn high-level abstractions (or representations) of raw data through multiple feature transformation layers. Large training datasets and deep complex structures  enhance the ability of deep learning models for learning effective representations for tasks of interest. There are three popular categories of deep learning models according to the types of connections between layers 
, namely feedforward models (direct connection), energy models (undirected connection) and recurrent neural networks (recurrent connection). Feedforward models, including Convolution Neural Network (CNN), propagate input features through each layer to extract high-level features. CNN is the state-of-the-art model for many computer vision tasks. Energy models, including Deep Belief Network (DBN) are typically used to pre-train other models, e.g., feedforward models. Recurrent Neural Network (RNN) is widely used for modeling sequential data. Machine translation and language modeling are popular applications of RNN.
Before deploying a deep learning model, the model parameters involved in the transformation layers need to be trained. The training turns out to be a numeric optimization procedure to find parameter values that minimize the discrepancy (loss function) between the expected output and the real output. Stochastic Gradient Descent (SGD) is the most widely used training algorithm. As shown in Figure1
, SGD initializes the parameters with random values, and then iteratively refines them based on the computed gradients with respect to the loss function. There are three commonly used algorithms for gradient computation corresponding to the three model categories above: Back Propagation (BP), Contrastive Divergence (CD) and Back Propagation Through Time (BPTT). By regarding the layers of a neural net as nodes of a graph, these algorithms can be evaluated by traversing the graph in certain sequences. For instance, the BP algorithm is illustrated in Figure2, where a simple feedforward model is trained by traversing along the solid arrows to compute the data (feature) of each layer, and along the dashed arrows to compute the gradient of each layer and each parameter ( and ).
3 Databases to Deep Learning
In this section, we discuss the optimization techniques used in deep learning systems, and research opportunities from the perspective of databases.
3.1 Stand-alone Training
Currently, the most effective approach for improving the training speed of deep learning models is using Nvidia GPU with the cuDNN library. Researchers are also working on other hardware, e.g. FPGA . Besides exploiting advancements in hardware technology, operation scheduling and memory management are two important components to consider.
3.1.1 Operation Scheduling
Training algorithms of deep learning models typically involve expensive linear algebra operations as shown in Figure 3, where the matrix and could be larger than . Operation scheduling is to first detect the data dependency of operations and then place the operations without dependencies onto executors, e.g., CUDA streams and CPU threads. Take the operations in Figure 3 as an example, and in Figure 3 could be computed in parallel because they have no dependencies. The first step could be done statically based on dataflow graph or dynamically  by analyzing the orders of read and write operations. Databases also have this kind of problems in optimizing transaction execution 
and query plans. Those solutions should be considered for deep learning systems. For instance, databases use cost models to estimate query plans. For deep learning, we may also create a cost model to find an optimal operation placing strategy for the second step of operation scheduling given a fixed computing resources including executors and memory.
Recent developments: Mirhoseini et al. 
propose to optimize the placement of operations on heterogeneous hardware devices (e.g., CPU and GPU) using reinforcement learning. Jia et al.[35, 33] go beyond simple operation parallelism to consider parallelism from multiple dimensions together, including data samples and channels, operations, attributes and parameters. In addition, operation substitution has been studied in 
, which substitutes the original operations with new ones that retain the semantics but lead to better overall efficiency. Operation fusing is one example. A cost-based search algorithm is introduced to find optimized computation graphs. Similar fusing techniques are applied in open-source libraries including Tensorflow
3.1.2 Memory Management
Deep learning models are becoming larger and larger, and already occupy a huge amount of memory space. For example, the VGG model  cannot be trained on normal GPU cards due to memory size constraints. Many approaches have been proposed towards reducing memory consumption. Shorter data representation, e.g. 16-bit float  is now supported by CUDA. Memory sharing is an effective approach for memory saving . Take Figure 3 as an example, the input and output of the function share the same variable and thus the same memory space. Such operations are called ‘in-place’ operations. Recently, two approaches were proposed to trade-off computation time for memory. Swapping memory between GPU and CPU resolves the problem of small GPU memory and large model size by swapping variables out to CPU and then swapping back manually. Another approach drops some variables to free memory and recomputes them when necessary based on the static dataflow graph.
Memory management is a hot topic in the database community with a significant amount of research towards in-memory databases [72, 91], including locality, paging and cache optimization. To elaborate more, the paging strategies could be useful for deciding when and which variable to swap. In addition, failure recovery in databases is similar to the idea of dropping and recomputing approach, hence the logging techniques in databases could be considered. If all operations (and execution time) are logged, we can then do runtime analysis without the static dataflow graph. Other techniques, including garbage collection and memory pool, would also be useful for deep learning systems, especially for GPU memory management.
Recent developments: The recomputing technique has been adopted in PyTorch . Wang et al.  combines recomputing and swapping to optimize the memory of convolutional neural networks. Zhang et al. propose a smart memory pool and automatic swapping strategy for deep neural networks to replace manual swapping in [13, 78]. Cai et al.  propose to slice the model to reduce the memory and computational resource consumption.
3.2 Distributed Training
Distributed training is a natural solution for accelerating the training speed of deep learning models. The parameter server architecture  is typically used, in which the workers compute parameter gradients and the servers update the parameter values after receiving gradients from workers. There are two basic parallelism schemes for distributed training, namely, data parallelism and model parallelism. In data parallelism, each worker is assigned a data partition and a model replica, while for model parallelism, each worker is assigned a partition of the model and the whole dataset. The database community has a long history of working on distributed environment, ranging from parallel databases  and peer-to-peer systems  to cloud computing . We will discuss some research problems relevant to databases arising from distributed training in the following paragraphs.
3.2.1 Communication and Synchronization
Given that deep learning models have a large set of parameters, the communication overhead between workers and servers is likely to be the bottleneck of a training system, especially when the workers are running on GPUs which decrease the computation time. In addition, for large clusters, the synchronization between workers also accounts. Consequently, it is important to investigate efficient communication protocols for both single-node multiple GPU training and training over a large cluster. Possible research directions include : a) compressing the parameters and gradients for transferring ; b) organizing servers in an optimized topology to reduce the communication burden of each single node, e.g., tree structure  and AllReduce structure  (all-to-all connection); c) using more efficient networking hardware like RDMA .
Recent developments: Gradient compression has shown to be effective in reducing the communication cost[37, 23, 73, 50]. Besides, Jiang et al.  propose a decentralized SGD algorithm which has similar convergence rate as mini-batch SGD but eliminates the parameter server. As a result, the traffic bottleneck at the parameter server is resolved. A more popular solution to resolve the bottleneck and improve the communication efficiency is to replace the parameter server architecture with all-reduce communication. Various all-reduce implementations [22, 31, 57] have been applied to train large-scale networks over thousands of GPUs.
|1. operation scheduling||✓||x||✓||-||-||x|
|2. memory management||d+a+p||i||d+s||p||p||-|
|3. parallelism||d + m||d||d + m||d + m||-||d + m|
|-: unknown 1. x: not available: ✓: available||2. d: dynamic; a: swap; p:memory pool; i: in-place operation; s: static;|
|3. d: data parallelism; m: model parallelism;||4. s: synchronous; a: asynchronous; h:hybrid|
3.2.2 Concurrency and Consistency
Concurrency and consistency are traditional research problems in databases. For distributed training of deep learning models, they also matter. Currently, both declarative programming (e.g., Theano and TenforFlow) and imperative programming (e.g., Caffe and SINGA) have been adopted in existing systems for concurrency implementation. Most deep learning systems use threads and locks directly. Other concurrency implementation methods like actor model (good at failure recovery), co-routine and communicating sequential processes have not been explored.
Sequential consistency (from synchronous training) and eventual consistency (from asynchronous training) are typically used for distributed deep learning. Both approaches have scalability issues . Recently, there are studies for training convex models (deep learning models are non-linear and non-convex) using a value bounded consistency model . Researchers are starting to investigate the influence of consistency models on distributed training [25, 26, 6]. There remains much research to be done on how to provide flexible consistency models for distributed training, and how each consistency model affects the scalability of the system, including communication overhead.
Recent developments: In recent papers and the benchmark testing , synchronous training is preferred to asynchronous training because the former one is more stable in terms of convergence. With warm-up, layer-wise adaptive rate scaling for the learning rate , label smoothing, etc., synchronous SGD can scale to over 2000 GPUs [88, 31] without sacrificing accuracy. Typically, they increase the batch size gradually from a few thousands to tens of thousands. FlexPS  is a system that support such training schemes that involve multiple stages.
3.2.3 Fault Tolerance
Databases systems have good durability via logging (e.g., command log) and checkpointing. Current deep learning systems recover the training from crashes mainly based on checkpointing files . However, frequent checkpointing would incur vast overhead. In contrast with database systems, which enforce strict consistency in transactions, the SGD algorithm used by deep learning training systems can tolerate a certain degree of inconsistency. Therefore, logging is not a must. How to exploit the SGD properties and system architectures to implement fault tolerance efficiently is an interesting problem. Considering that distributed training would replicate the model status, it is thus possible to recover from a replica instead of checkpointing files. Robust frameworks (or concurrency model) like actor model, could be adopted to implement this kind of failure recovery.
3.3 Optimization Techniques in Existing Systems
A summary of existing systems in terms of the above mentioned optimization aspects is listed in Table 1. Many researchers have done ad hoc optimization using Caffe, including memory swapping and communication optimization. However, the official version is not well optimized. Similarly, Torch itself provides limited support for distributed training. Mxnet has optimization for both memory and operations scheduling. Theano is typically used for stand-alone training. TensorFlow is potential for the aforementioned static optimization based on the dataflow graph.
We are optimizing the Apache incubator SINGA system  starting from version 1.0. For stand-alone training, cost models are explored for runtime operation scheduling. Memory optimization including dropping, swapping and garbage collection with memory pool will be implemented. OpenCL is supported to run SINGA on a wide range of hardware including GPU, FPGA and ARM. For distributed training, SINGA (V0.3) has done much work on flexible parallelism and consistency, hence the focus would be on optimization of communication and fault-tolerance, which are missing in almost all systems.
4 Deep Learning to Databases
Deep learning applications, such as computer vision and NLP, may appear very different from database applications. However, the core idea of deep learning, known as feature (or representation) learning, is applicable to a wide range of applications. Intuitively, once we have effective representations for entities, e.g., images, words, table rows or columns, we can compute entity similarity, perform clustering, train prediction models, and retrieve data with different modalities [82, 81] etc. We shall highlight a few deep learning models that could be adapted for database applications below.
4.1 Query Interface
Natural language query interfaces have been attempted for decades , because of their great desirability, particularly for non-expert database users. However, it is challenging for database systems to interpret (or understand) the semantics of natural language queries. Recently, deep learning models have achieved state-of-the-art performance for NLP tasks . Moreover, RNN has been shown to be able to learn structured output [71, 74]. As one solution, we can apply RNN models for parsing natural language queries to generate SQL queries, and refine it using existing database approaches. The challenge is that a large amount of (labeled) training samples is required to train the model. One possible solution is to train a baseline model with a small dataset, and gradually refining it with users’ feedback. For instance, users could help correct the generated SQL query, and these feedback essentially serve as labeled data for subsequent training.
Recent developments: Multiple annotated datasets that consist of text query and SQL query pairs have been created using templates [94, 3] and user feedback . The solutions [94, 30, 18] generally extend the sequence-to-sequence model to encode the text query and then generate the SQL query via the decoder. Domain knowledge like the SQL grammar is exploited.
4.2 Query Plans
Query plan optimization is a traditional database problem. Most current database systems use complex heuristic and cost models to generate the query plan. According to, each query plan of a parametric SQL query template has an optimality region. As long as the parameters of the SQL query are within this region, the optimal query plan does not change. In other words, query plans are in-sensitive to small variations of the input parameters. Therefore, we can train a query planner which learns from a set of pairs of SQL queries and optimal plans to generate (similar) plans for new (similar) queries. To elaborate more, we can learn a RNN model that accepts the SQL query elements and meta-data (like buffer size and primary key) as input, and generates a tree structure  representing the query plan. Reinforcement learning (like AlphaGo 
) could also be incorporated to train the model on-line using the execution time and memory footprint as the reward. Note that approaches purely based on deep learning models may not be very effective. First, the query plan is generated based on probability, which is likely to have grammar errors. Second, the training dataset may not be comprehensive to include all query patterns, e.g., some predicates could be missing in the training datasets. To solve these problems, a better approach would be combining database solutions and deep learning, e.g. using some heuristics to check and correct grammar errors.
Recent developments: Recently, there has been an increasing trend in applying deep learning techniques for optimizing database systems, including query optimization by deciding the join order [41, 54], query performance prediction , cardinality estimation for join queries [51, 70] and database configuration tuning, etc. Deep reinforcement learning is the key model for supporting these optimizations . Kraska et al. propose a learned index that uses neural networks to map the key to the location of the record. SageDB  goes further by providing a vision to build a database system that can optimize towards a specific application. It exploits the data and workload distribution of the application to learn models for data access and query plan optimization.
4.3 Crowdsourcing and Knowledge Bases
Many crowdsourcing  and knowledge base  applications involve entity extraction, disambiguation and fusion problems, where the entity could be a row of a database, a node in a graph, etc. With the advancements of deep learning models in NLP , it is opportune to consider deep learning for these problems. In particular, we can learn representations for entities and then do entity relationship reasoning  and similarity calculation using the learned representations.
4.4 Spatial and Temporal Data
Spatial and temporal data are common data types in database systems , and are commonly used for trend analysis, progression modeling and predictive analytics. Spatial data is typically processed by mapping moving objects into rectangular blocks. If we regard each block as a pixel of one image, then deep learning models, e.g., CNN, could be exploited to extract the spatial locality between nearby blocks. For instance, if we have the real-time location data (e.g., GPS data) of moving objects, we could learn a CNN model to capture the density relationships of nearby areas for predicting the traffic congestion for a future time point. When temporal data is modeled as features over a time matrix, deep learning models, e.g. RNN, can be designed to model time dependency and predict the occurrence in a future time point. A particular example would be disease progression modeling  based on historical medical records, where doctors would want to estimate the onset of certain severity of a known disease. In fact, most healthcare data is time-serise data, and thus deep learning can make great contribution in healthcare data analysis [45, 52].
In this paper, we have discussed databases and deep learning. Databases have many techniques for optimizing system performance, while deep learning is good at learning effective representation for data-driven applications. We note that these two “different” areas share some common techniques for improving the system performance, such as memory optimization and parallelism. We have discussed some possible improvements for deep learning systems using database techniques, and research problems applying deep learning techniques in database applications. To make the database systems more autonomic, with the ability to learn and optimize, and with ability to support complex analytics and predictions beyond data aggregation, we foresee a seamless integration of ML/DL and database technologies. With the implementation of 5G mobility network, we foresee the distribution of databases, training and inferencing at the edge devices, which will lead to further integration and adaption of technologies. Let us not miss the opportunity to contribute to the existing challenges ahead!
We would like to thank Divesh Srivastava for his valuable comments. This work was supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its Competitive Research Programme (CRP Award No. NRF-CRP8-2011-08), and Singapore Ministry of Education Academic Research Fund Tier 3 under MOE’s official grant number MOE2017-T3-1-007. Meihui Zhang was supported by China Thousand Talents Program for Young Professionals (3070011181811).
-  M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.
-  F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.
-  R. Cai, B. Xu, Z. Zhang, X. Yang, Z. Li, and Z. Liang. An encoder-decoder framework translating natural language to database queries. In IJCAI, 2018.
-  S. Cai, G. Chen, B. C. Ooi, and J. Gao. Model slicing for supporting complex analytics with elastic inference cost and resource constraints. CoRR, abs/1904.01831, 2019.
-  S. Cai, Y. Shu, W. Wang, and B. C. Ooi. Isbnet: Instance-aware selective branching network. CoRR, abs/1905.04849, 2019.
-  J. Chen, R. Monga, S. Bengio, and R. Józefowicz. Revisiting distributed synchronous SGD. CoRR, abs/1604.00981, 2016.
-  T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, abs/1512.01274, 2015.
-  T. Chen, B. Xu, C. Zhang, and C. Guestrin. Training deep nets with sublinear memory cost. CoRR, abs/1604.06174, 2016.
-  A. Coates, B. Huval, T. Wang, D. J. Wu, B. C. Catanzaro, and A. Y. Ng. Deep learning with COTS HPC systems. In ICML, pages 1337–1345, 2013.
-  C. A. Coleman, D. Narayanan, D. Kang, T. J. Zhao, J. Zhang, L. Nardi, P. Bailis, K. Olukotun, C. Ré, and M. A. Zaharia. Dawnbench : An end-to-end deep learning benchmark and competition. In NIPS ML Systems Workshop, 2017.
-  R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop, 2011.
-  M. Courbariaux, Y. Bengio, and J.-P. David. Low precision arithmetic for deep learning. CoRR, abs/1412.7024, 2014.
-  H. Cui, H. Zhang, G. R. Ganger, P. B. Gibbons, and E. P. Xing. Geeps: Scalable deep learning on distributed gpus with a gpu-specialized parameter server. In EuroSys, page 4. ACM, 2016.
-  J. Dai, M. Zhang, G. Chen, J. Fan, K. Y. Ngiam, and B. C. Ooi. Fine-grained concept linking using neural networks in healthcare. In SIGMOD, pages 51–66. ACM, 2018.
-  J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. W. Senior, P. A. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, pages 1232–1240, 2012.
-  X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun, and W. Zhang. From data fusion to knowledge fusion. PVLDB, 7(10):881–892, 2014.
-  M. Ebraheem, S. Thirumuruganathan, S. R. Joty, M. Ouzzani, and N. Tang. Deeper - deep entity resolution. CoRR, abs/1710.00597, 2017.
-  C. Finegan-Dollak, J. K. Kummerfeld, L. Zhang, K. Ramanathan, S. Sadasivam, R. Zhang, and D. Radev. Improving text-to-SQL evaluation methodology. In ACL, pages 351–360, Melbourne, Australia, July 2018.
-  M. Francis-Landau, G. Durrett, and D. Klein. Capturing semantic similarity for entity linking with convolutional neural networks. In NAACL-HLT, pages 1256–1261, San Diego, California, June 2016.
-  J. Gao, H. V. Jagadish, and B. C. Ooi. Active sampler: Light-weight accelerator for complex data analytics at scale. CoRR, abs/1512.03880, 2015.
-  Y. Goldberg. A primer on neural network models for natural language processing. CoRR, abs/1510.00726, 2015.
-  P. Goyal, P. Dollár, R. B. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677, 2017.
-  D. Grubic, L. K. Tam, D. Alistarh, and C. Zhang. Synchronous multi-gpu deep learning with low-precision communication: An experimental study. In EDBT, pages 145 – 156. OpenProceedings, 2018.
-  C. Guo, C. S. Jensen, and B. Yang. Towards total traffic awareness. ACM SIGMOD Record, 43(3):18–23, 2014.
-  S. Gupta, W. Zhang, and F. Wang. Model accuracy and runtime tradeoff in distributed deep learning: A systematic study. In ICDM, pages 171–180. IEEE, 2016.
-  S. Hadjis, C. Zhang, I. Mitliagkas, and C. Ré. Omnivore: An optimizer for multi-device deep learning on cpus and gpus. CoRR, abs/1606.04487, 2016.
-  J. R. Haritsa. The picasso database query optimizer visualizer. Proceedings of the VLDB Endowment, 3(1-2):1517–1520, 2010.
-  Y. Huang, T. Jin, Y. Wu, Z. Cai, X. Yan, F. Yang, J. Li, Y. Guo, and J. Cheng. Flexps: Flexible parallelism control in parameter server architecture. PVLDB, 11:566–579, 2018.
-  Y. B. Ian Goodfellow and A. Courville. Deep learning. Book in preparation for MIT Press, 2016.
-  S. Iyer, I. Konstas, A. Cheung, J. Krishnamurthy, and L. Zettlemoyer. Learning a neural semantic parser from user feedback. CoRR, abs/1704.08760, 2017.
-  X. Jia, S. Song, W. He, Y. Wang, H. Rong, F. Zhou, L. Xie, Z. Guo, Y. Yang, L. Yu, T. Chen, G. Hu, S. Shi, and X. Chu. Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. CoRR, abs/1807.11205, 2018.
-  Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093, 2014.
-  Z. Jia, S. Lin, C. R. Qi, and A. Aiken. Exploring hidden dimensions in parallelizing convolutional neural networks. CoRR, abs/1802.04924, 2018.
-  Z. Jia, J. O. Thomas, T. Warszawski, M. Gao, M. A. Zaharia, and A. H. Aiken. Optimizing dnn computation with relaxed graph substitutions. In SysML, 2019.
-  Z. Jia, M. Zaharia, and A. Aiken. Beyond data and model parallelism for deep neural networks. In SysML, 2019.
-  J. Jiang, B. Cui, C. Zhang, and L. Yu. Heterogeneity-aware distributed parameter servers. In SIGMOD, pages 463–478. ACM, 2017.
-  J. Jiang, F. Fu, T. Yang, and B. Cui. Sketchml: Accelerating distributed machine learning with data sketches. In SIGMOD, pages 1269–1284, New York, NY, USA, 2018. ACM.
-  R. Jiang, X. Song, Z. Fan, T. Xia, Q. Chen, S. Miyazawa, and R. Shibasaki. DeepUrbanMomentum: An online deep-learning system for short-term urban mobility prediction. In AAAI, 2018.
-  T. Kraska, M. Alizadeh, A. Beutel, E. H. hsin Chi, A. Kristo, G. Leclerc, S. Madden, H. Mao, and V. Nathan. Sagedb: A learned database system. In CIDR, 2019.
-  T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The case for learned index structures. In SIGMOD, pages 489–504, New York, NY, USA, 2018. ACM.
-  S. Krishnan, Z. Yang, K. Goldberg, J. M. Hellerstein, and I. Stoica. Learning to optimize join queries with deep reinforcement learning. CoRR, abs/1808.03196, 2018.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012.
-  G. Lacey, G. W. Taylor, and S. Areibi. Deep learning on fpgas: Past, present, and future. CoRR, abs/1602.04283, 2016.
-  Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
-  C. Lee, Z. Luo, K. Y. Ngiam, M. Zhang, K. Zheng, G. Chen, B. C. Ooi, and W. L. J. Yip. Big Healthcare Data Analytics: Challenges and Applications, pages 11–41. Springer International Publishing, Cham, 2017.
-  M. L. Lee, M. Kitsuregawa, B. C. Ooi, K.-L. Tan, and A. Mondal. Towards self-tuning data placement in parallel database systems. In ACM SIGMOD Record, volume 29, pages 225–236. ACM, 2000.
-  F. Li and H. Jagadish. Constructing an interactive natural language interface for relational databases. PVLDB, 8(1):73–84, 2014.
-  F. Li, B. C. Ooi, M. T. Özsu, and S. Wu. Distributed data management using mapreduce. ACM Comput. Surv., 46(3):31:1–31:42, 2014.
-  Y. Li, K. Fu, Z. Wang, C. Shahabi, J. Ye, and Y. Liu. Multi-task representation learning for travel time estimation. In KDD, 2018.
-  Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally. Deep gradient compression: Reducing the communication bandwidth for distributed training. CoRR, abs/1712.01887, 2017.
-  H. Liu, M. Xu, Z. Yu, V. Corvinelli, and C. Zuzarte. Cardinality estimation using neural networks. In CCSE, CASCON ’15, pages 53–59, Riverton, NJ, USA, 2015. IBM Corp.
-  Z. Luo, S. Cai, J. Gao, M. Zhang, K. Y. Ngiam, G. Chen, and W. Lee. Adaptive lightweight regularization tool for complex analytics. In ICDE, pages 485–496, 2018.
-  Y. Ma, X. Zhu, S. Zhang, R. Yang, W. Wang, and D. Manocha. Trafficpredict: Trajectory prediction for heterogeneous traffic-agents. CoRR, abs/1811.02146, 2019.
-  R. Marcus and O. Papaemmanouil. Deep reinforcement learning for join order enumeration. CoRR, abs/1803.00055, 2018.
-  R. Marcus and O. Papaemmanouil. Plan-structured deep neural network models for query performance prediction. CoRR, abs/1902.00132, 2019.
-  R. Marcus and O. Papaemmanouil. Towards a hands-free query optimizer through deep learning. CIDR, 2019.
-  H. Mikami, H. Suganuma, P. U.-Chupala, Y. Tanaka, and Y. Kageyama. Imagenet/resnet-50 training in 224 seconds. CoRR, abs/1811.05233, 2018.
-  A. Mirhoseini, H. Pham, Q. Le, M. Norouzi, S. Bengio, B. Steiner, Y. Zhou, N. Kumar, R. Larsen, and J. Dean. Device placement optimization with reinforcement learning. 2017.
-  D. R. Mould. Models for disease progression: New approaches and uses. Clinical Pharmacology & Therapeutics, 92(1):125–131, 2012.
-  S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, and V. Raghavendra. Deep learning for entity matching: A design space exploration. In SIGMOD, pages 19–34, New York, NY, USA, 2018. ACM.
-  B. C. Ooi, K. Tan, Q. T. Tran, J. W. L. Yip, G. Chen, Z. J. Ling, T. Nguyen, A. K. H. Tung, and M. Zhang. Contextual crowd intelligence. SIGKDD Explorations, 16(1):39–46, 2014.
-  B. C. Ooi, K.-L. Tan, S. Wang, W. Wang, Q. Cai, G. Chen, J. Gao, Z. Luo, A. K. H. Tung, Y. Wang, Z. Xie, M. Zhang, and K. Zheng. SINGA: A distributed deep learning platform. In ACM Multimedia, 2015.
A. Paszke, S. Gross, S. Chintala, and G. Chanan.
Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration.PyTorch: Tensors and dynamic neural networks in Python with strong GPU acceleration, 6, 2017.
-  G. Pleiss, D. Chen, G. Huang, T. Li, L. van der Maaten, and K. Q. Weinberger. Memory-efficient implementation of densenets. CoRR, abs/1707.06990, 2017.
-  C. Ré, D. Agrawal, M. Balazinska, M. I. Cafarella, M. I. Jordan, T. Kraska, and R. Ramakrishnan. Machine learning and databases: The sound of things to come or a cacophony of hype? In SIGMOD, pages 283–284, 2015.
-  F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In INTERSPEECH, pages 1058–1062, 2014.
-  D. Silver and et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
-  K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
-  R. Socher, D. Chen, C. D. Manning, and A. Ng. Reasoning with neural tensor networks for knowledge base completion. In NIPS, pages 926–934, 2013.
-  J. Sun and G. Li. An End-to-End Learning-based Cost Estimator. arXiv e-prints, page arXiv:1906.02560, Jun 2019.
-  I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104–3112, 2014.
-  K.-L. Tan, Q. Cai, B. C. Ooi, W.-F. Wong, C. Yao, and H. Zhang. In-memory databases: Challenges and opportunities from software and hardware perspectives. ACM SIGMOD Record, 44(2):35–40, 2015.
-  H. Tang, C. Yu, X. Lian, T. Zhang, and J. Liu. DoubleSqueeze: Parallel stochastic gradient descent with double-pass error-compensated compression. In ICML, pages 6155–6165, 2019.
-  O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever, and G. Hinton. Grammar as a foreign language. arXiv:1412.7449, 2014.
-  D. Vorona, A. Kipf, T. Neumann, and A. Kemper. DeepSPACE: Approximate Geospatial Query Processing with Deep Learning. arXiv e-prints, page arXiv:1906.06085, Jun 2019.
-  Q. H. Vu, M. Lupu, and B. C. Ooi. Peer-to-peer computing. Springer, 2010.
-  D. Wang, J. Zhang, W. Cao, J. Li, and Y. Zheng. When will you arrive? estimating travel time based on deep neural networks. In AAAI, 2018.
-  L. Wang, J. Ye, Y. Zhao, W. Wu, A. Li, S. L. Song, Z. Xu, and T. Kraska. Superneurons: Dynamic gpu memory management for training deep neural networks. In ACM SIGPLAN Notices, volume 53, pages 41–53. ACM, 2018.
-  P. Wang, Y. Fu, J. Zhang, P. Wang, Y. Zheng, and C. C. Aggarwal. You are how you drive: Peer and temporal-aware representation learning for driving behavior analysis. In KDD, 2018.
-  W. Wang, G. Chen, T. T. A. Dinh, J. Gao, B. C. Ooi, K.-L. Tan, and S. Wang. SINGA: Putting deep learning in the hands of multimedia users. In ACM Multimedia, 2015.
-  W. Wang, B. C. Ooi, X. Yang, D. Zhang, and Y. Zhuang. Effective multi-modal retrieval based on stacked auto-encoders. PVLDB, 7(8):649–660, 2014.
-  W. Wang, X. Yang, B. C. Ooi, D. Zhang, and Y. Zhuang. Effective deep learning-based multi-modal retrieval. The VLDB Journal, pages 1–23, 2015.
-  W. Wang, M. Zhang, G. Chen, H. V. Jagadish, B. C. Ooi, and K.-L. Tan. Database meets deep learning: Challenges and opportunities. SIGMOD Rec., 45(2):17–22, Sept. 2016.
-  Z. Wang, K. Fu, and J. Ye. Learning to estimate the travel time. In KDD, 2018.
-  J. Wei, W. Dai, A. Qiao, Q. Ho, H. Cui, G. R. Ganger, P. B. Gibbons, G. A. Gibson, and E. P. Xing. Managed communication and consistency for fast data-parallel iterative analytics. In SoCC, pages 381–394, 2015.
-  R. Wu, S. Yan, Y. Shan, Q. Dang, and G. Sun. Deep image: Scaling up image recognition. CoRR, abs/1501.02876, 2015.
-  T. Wu, L. Chen, P. Hui, C. J. Zhang, and W. Li. Hear the whole story: Towards the diversity of opinion in crowdsourcing markets. PVLDB, 8(5):485–496, 2015.
-  M. Yamazaki, A. Kasagi, A. Tabuchi, T. Honda, M. Miwa, N. Fukumoto, T. Tabaru, A. Ike, and K. Nakashima. Yet another accelerated sgd: Resnet-50 training on imagenet in 74.7 seconds. CoRR, abs/1903.12650, 2019.
-  C. Yao, D. Agrawal, G. Chen, Q. Lin, B. C. Ooi, W. F. Wong, and M. Zhang. Exploiting single-threaded model in multi-core in-memory systems. IEEE Trans. Knowl. Data Eng., 2016.
-  M. D. Zeiler. Adadelta: An adaptive learning rate method. arXiv:1212.5701, 2012.
-  H. Zhang, G. Chen, B. C. Ooi, K. Tan, and M. Zhang. In-memory big data management and processing: A survey. IEEE Trans. Knowl. Data Eng., 27(7):1920–1948, 2015.
-  J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, J. Xing, Y. Wang, T. Cheng, L. Liu, M. Ran, and Z. Li. An end-to-end automatic cloud database tuningsystem using deep reinforcement learning. SIGMOD, 2019.
-  J. Zhang, S. Yeung, Y. Shu, B. He, and W. Wang. Efficient memory management for gpu-based deep learning systems. CoRR, abs/1903.06631, 2019.
-  V. Zhong, C. Xiong, and R. Socher. Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103, 2017.