Active Learning

What is Active Learning?

Active learning is a form of semi-supervised machine learning where the algorithm can choose which data it wants to learn from. With this approach, the program can actively query an authority source, either the programmer or a labeled dataset, to learn the correct prediction for a given problem.

The goal of this iterative learning approach is to speed along the learning process, especially if you don’t have a large labeled dataset to practice traditional supervised learning methods.

One of the most popular applications for active learning is in the labelling intensive Natural Language Processing field. This method can produce similar results as supervised learning, with a fraction of the human involvement.

How does Active Learning Work in Practice?

While there are many specific query strategies, such as least confidence, margin sampling and entropy sampling, there are just three broad scenarios where the active learning AI needs to query the proper labels of data. 

  • Membership Query Synthesis: This is where the learner generates its own instance from an underlying natural distribution. For example, if the dataset are pictures of humans and animals, the learner could send a clipped image of a leg to the teacher and query if this appendage belongs to an animal or human. This is particularly useful if your dataset is small.
  • Stream-Based Selective Sampling: Here, each unlabeled data point is examined one at a time with the machine evaluating the informativeness of each item against its query parameters. The learner decides for itself whether to assign a label or query the teacher for each datapoint.
  • Pool-Based Sampling: In this scenario, instances are drawn from the entire data pool and assigned an informative score, a measurement of how well the learner “understands” the data. The system then selects the most informative instances and queries the teacher for the labels.