ALiPy
ALiPy is a python package for experimenting with different active learning settings and algorithms.
view repo
Supervised machine learning methods usually require a large set of labeled examples for model training. However, in many real applications, there are plentiful unlabeled data but limited labeled data; and the acquisition of labels is costly. Active learning (AL) reduces the labeling cost by iteratively selecting the most valuable data to query their labels from the annotator. This article introduces a Python toobox ALiPy for active learning. ALiPy provides a module based implementation of active learning framework, which allows users to conveniently evaluate, compare and analyze the performance of active learning methods. In the toolbox, multiple options are available for each component of the learning framework, including data process, active selection, label query, results visualization, etc. In addition to the implementations of more than 20 state-of-the-art active learning algorithms, ALiPy also supports users to easily configure and implement their own approaches under different active learning settings, such as AL for multi-label data, AL with noisy annotators, AL with different costs and so on. The toolbox is well-documented and open-source on Github, and can be easily installed through PyPI.
READ FULL TEXT VIEW PDFALiPy is a python package for experimenting with different active learning settings and algorithms.
Active learning is a main approach to learning with limited labeled data. It tries to reduce the human efforts on data annotation by actively querying the most important examples (Settles (2009)).
ALiPy is a Python toolbox for active learning, which is suitable for various users. On one hand, the whole process of active learning has been well implemented. Users can easily perform experiments by several lines of codes to finish the whole process from data pre-process to result visualization. Also, more than 20 commonly used active learning methods have been implemented in the toolbox, providing users many choices. Table 1 summarizes the main approaches implemented in ALiPy. On the other hand, ALiPy supports users to implement their own ideas about active learning with high freedom. By decomposing the active learning process into multiple components, and correspondingly implementing them with different modules, ALiPy is designed in a low coupling way, and thus let users to freely configure and modify any parts of the active learning. Furthermore, in addition to the traditional active learning setting, ALiPy also supports other novel settings. For example, the data examples could be multi-labeled, the oracle could be noisy, and the annotation could be cost-sensitive.
AL with Instance Selection | Uncertainty (Lewis and Gale (1994)), Query By Committee (Abe and Mamitsuka (1998)), Expected Error Reduction (Roy and McCallum (2001)), Random, Graph Density (Ebert et al. (2012)), BMDR (Wang and Ye (2013))), QUIRE (Huang et al. (2010)), LAL (Konyushkova et al. (2017)), SPAL (Tang and Huang (2019)) |
---|---|
AL for Multi-Label Data | AUDI (Huang and Zhou (2013)), QUIRE (Huang et al. (2014)), MMC (Yang et al. (2009)), Adaptive (Li and Guo (2013)), Random |
AL by Querying Features | AFASMC (Huang et al. (2018)), Stability (Chakraborty et al. (2013)), Random |
AL with Different Costs | HALC (Yan and Huang (2018)), Random, Cost performance |
AL with Noisy Oracles | CEAL (Huang et al. (2017)), IEthresh (Donmez et al. (2009)), Repeated (Sheng et al. (2008)), Random |
AL with Novel Query Types | AURO (Huang et al. (2015)) |
AL for Large Scale Tasks | Subsampling |
As illustrated in Figure 1, we decompose the active learning implementation into multiple components. To facilitate the implementation of different active learning methods under different settings, we develop ALiPy based on multiple modules, each corresponding to a component of the active learning process.
Below is the list of modules in ALiPy.
alipy.data_manipulate: It provides the basic functions of data pre-process and partition. Cross validation or hold out test are supported.
alipy.query_strategy: It consists of 25 commonly used query strategies.
alipy.index.IndexCollection: It helps to manage the indexes of labeled and unlabeled examples.
alipy.metric: It provides multiple criteria to evaluate the model performance.
alipy.experiment.state and alipy.experiment.state_io: They help to save the intermediate results after each query and can recover the program from breakpoints.
alipy.experiment.stopping_criteria It implements some commonly used stopping criteria.
alipy.oracle: It supports different oracle settings. One can set to have multiple oracles with noisy annotations and different costs.
alipy.experiment.experiment_analyser: It provides functions for gathering, processing and visualizing the experimental results.
alipy.utils.multi_thread: It provides a parallel implementation of k-fold experiments.
The above modules are independently designed implemented. In this way, the code between different parts can be implemented without limitation. Also, each independent module can be replaced by users’ own implementation (without inheriting). The modules in ALiPy will not influence each other and thus can be substituted freely.
In each module, we also provide a high flexibility to make the toolbox adaptive to different settings. For example, in data split function, one can provide the shape of your data matrix or a list of example names to get the split. In the oracle class, one can further specify the cost of each label, and query instance-label pairs in multi-label setting. In the analyser class, the experimental results can also be unaligned for cost-sensitive setting, where an interpolate will be performed automatically when plotting the learning curves.
For more details, please refer to the document at http://parnec.nuaa.edu.cn/huangsj/alipy, and the git repository at https://github.com/NUAA-AL/ALiPy.
ALiPy provides several optional usages for different users.
For the users who are less familiar with active learning and want to simply apply a method to a dataset, ALiPy provides a class which has encapsulated various tools and implemented the main loop of active learning, namely alipy.experiment.AlExperiment. Users can run the experiments with only a few lines of codes by this class without any background knowledge.
For the users who want to experimentally evaluate the performance of existing active learning methods, ALiPy provides implementations of more than 20 state-of-the-art methods, along with detailed instructions and plentiful example codes.
For the users who want to implement their own idea and perform active learning experiments, ALiPy provides module based structure to support users to modify any part of active learning. More importantly, some novel settings are supported to make the implementation more convenient. We also provide detailed api references and usage examples for each module and setting to help users get started quickly. Note that, ALiPy does not force users to use any tool classes, they are designed in an independent way and can be substituted by users’ own implementation without inheriting anything.
For details, please refer to the documents and code examples available on the ALiPy homepage and github.
IEEE Conference on Computer Vision and Pattern Recognition
, pages 3626–3633, 2012.Proceedings of the 25th International Joint Conference on Artificial Intelligence
, pages 946–952, 2015.A sequential algorithm for training text classifiers.
In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 3–12, 1994.Toward optimal active learning through sampling estimation of error reduction.
In Proceedings of the 18th International Conference on Machine Learning, pages 441–448, 2001.
Comments
There are no comments yet.