Finding Reusable Machine Learning Components to Build Programming Language Processing Pipelines

08/11/2022
by   Patrick Flynn, et al.
0

Programming Language Processing (PLP) using machine learning has made vast improvements in the past few years. Increasingly more people are interested in exploring this promising field. However, it is challenging for new researchers and developers to find the right components to construct their own machine learning pipelines, given the diverse PLP tasks to be solved, the large number of datasets and models being released, and the set of complex compilers or tools involved. To improve the findability, accessibility, interoperability and reusability (FAIRness) of machine learning components, we collect and analyze a set of representative papers in the domain of machine learning-based PLP. We then identify and characterize key concepts including PLP tasks, model architectures and supportive tools. Finally, we show some example use cases of leveraging the reusable components to construct machine learning pipelines to solve a set of PLP tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2022

Jenkins Pipelines: A Novel Approach to Machine Learning Operations (MLOps)

Machine Learning is a widely popular field that is being used in an incr...
research
10/18/2019

PyTorchPipe: a framework for rapid prototyping of pipelines combining language and vision

Access to vast amounts of data along with affordable computational power...
research
04/03/2018

Vanlearning: A Machine Learning SaaS Application for People Without Programming Backgrounds

Although we have tons of machine learning tools to analyze data, most of...
research
05/01/2020

PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines

In recent years, a wide variety of automated machine learning (AutoML) m...
research
03/02/2021

Implementing G-Machine in HyperLMNtal

Since language processing systems generally allocate/discard memory with...
research
06/14/2021

Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python

Despite impressive success of machine learning algorithms in clinical na...
research
11/07/2022

Using Set Covering to Generate Databases for Holistic Steganalysis

Within an operational framework, covers used by a steganographer are lik...

Please sign up or login with your details

Forgot password? Click here to reset