A Hybrid Approach for Learning Program Representations
Learning neural program embedding is the key to utilizing deep neural networks in program language research. Precise and efficient program representations enable the application of deep models to a wide range of program analysis tasks. Existing works predominately learn to embed programs from their source code, as a result, they do not capture a deep, precise representation of program semantics. Models learning from the runtime information heavily depend on the quality of program executions, which adds uncertainty to the training process. This paper tackles these weaknesses of prior works by introducing a new deep neural network, LIGER, which learns program representations from a mixture of symbolic and concrete execution traces. We have evaluated LIGER on COSET, a recently proposed benchmark for evaluating the neural program embeddings. Results show LIGER is significantly more accurate than Gated Graph Neural Network and code2vec in classifying the program semantics while requiring on average almost ten times less executions covering 74 method's name from the vector representation of its body. By learning on the same set of functions (more than 170K in total), LIGER significantly outperforms code2seq, the previous state-of-the-art.
READ FULL TEXT