What is Automated machine learning (AutoML) ?
Automated machine learning (AutoML) is the process of automating end-to-end the process of applying machine learning to real-world problems. In a typical machine learning application, practitioners have a dataset consisting of input data points to train on. The raw data itself may not be in a form that all algorithms may be applicable to it out of the box. An expert may have to apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning.
Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning.
Automating the process of applying machine learning end-to-end offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand. However, AutoML is not a silver bullet and can introduce additional parameters of its own, called hyperhyperparameters, which may need some expertise to be set themselves. But it does make application of Machine Learning easier for non-experts. Source: Wiki
Various stages of the AutoML process
Automated machine learning can target various stages of the machine learning process:
- Automated data preparation and ingestion (from raw data and miscellaneous formats)
- Automated column type detection; e.g., boolean, discrete numerical, continuous numerical, or text
- Automated column intent detection; e.g., target/label, stratification field, numerical feature, categorical text feature, or free text feature
- Automated task detection; e.g., binary classification, regression, clustering, or ranking
- Automated feature engineering
- Automated model selection
- Hyperparameter optimization of the learning algorithm and featurization
- Automated pipeline selection under time, memory, and complexity constraints
- Automated selection of evaluation metrics / validation procedures
- Automated problem checking
- Leakage detection
- Misconfiguration detection
- Automated analysis of results obtained
- User interfaces and visualizations for automated machine learning.Source: Wiki
The Need for AutoML
The interest for AI frameworks has taken off in the course of recent years. This is because of the achievement of ML in a wide scope of utilizations today. Be that as it may, even with this unmistakable sign that AI can give lifts to specific organizations, a great deal of organizations battle to convey ML models.
To begin with, they have to set up a group of seasoned data researchers who direction a top notch compensation. Second, regardless of whether you have an extraordinary team, deciding which model is the best for your concern regularly requires more understanding than information.
The accomplishment of AI in a wide scope of uses has prompted a consistently developing interest for AI frameworks that can be utilized off the rack by non-experts. AutoML will in general mechanize the most extreme number of ventures in a ML pipeline—with a base measure of human exertion and without bargaining the model’s exhibition.
The adavntages of AutoML can be summed up in three noteworthy focuses:
- Expands productivity by robotizing monotonous assignments. This empowers a data researcher to concentrate more on the issue as opposed to the models.
- Automating the ML pipeline likewise helps to avoid errors that may sneak in physically.
- Eventually, AutoML is a stage towards democratizing machine learning by making the intensity of ML available to everyone.
A List of Different AutoML Frameworks
Auto-Keras is an open source software library for automated machine learning (AutoML). It is developed by DATA Lab at Texas A&M University and community contributors.
The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. Auto-Keras provides functions to automatically search for architecture and hyperparameters of deep learning models.
To install the package, please use the
pip installation as follows:
pip install autokeras
Note: currently, Auto-Keras is only compatible with: Python 3.6.
Here is a short example of using the package.
import autokeras as ak
clf = ak.ImageClassifier()
results = clf.predict(x_test)
auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning.
It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction.
Find the documentation here
Tree-Based Pipeline Optimization Tool (TPOT)
The Tree-Based Pipeline Optimization Tool (TPOT) was one of the very first AutoML methods and open-source software packages developed for the data science community.
TPOT was developed by Dr. Randal Olson while a postdoctoral student with Dr. Jason H. Moore at the Computational Genetics Laboratory of the University of Pennsylvania and is still being extended and supported by this team.
The goal of TPOT is to automate the building of ML pipelines by combining a flexible expression tree representation of pipelines with stochastic search algorithms such as genetic programming. TPOT makes use of the Python-based scikit-learn library as its ML menu.
AutoML is a function in H2O that automates the process of building a large number of models, with the goal of finding the “best” model without any prior knowledge or effort by the Data Scientist.
More information and code examples are available in the AutoML User Guide.
Cloud AutoML is a suite of machine learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs.
It relies on Google’s state-of-the-art transfer learning and neural architecture search technology.
MLBox is a powerful Automated Machine Learning python library. It provides the following features:
- Fast reading and distributed data preprocessing/cleaning/formatting
- Highly robust feature selection and leak detection
- Accurate hyper-parameter optimization in high-dimensional space
- State-of-the art predictive models for classification and regression (Deep Learning, Stacking, LightGBM,…)
- Prediction with models interpretation
For more details, please refer to the official documentation
It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse.
Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time.
Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.
The core design principles we baked into the toolbox are:
- No coding required: no coding skills are required to train a model and use it for obtaining predictions.
- Generality: a new data type-based approach to deep learning model design that makes the tool usable across many different use cases.
- Flexibility: experienced users have extensive control over model building and training, while newcomers will find it easy to use.
- Extensibility: easy to add new model architecture and new feature data types.
- Understandability: deep learning model internals are often considered black boxes, but we provide standard visualizations to understand their performance and compare their predictions.
- Open Source: Apache License 2.0
Ludwig has been developed and tested with Python 3 in mind. If you don’t have Python 3 installed, install it by running:
sudo apt install python3 # on ubuntu
brew install python3 # on mac
In order to install Ludwig just run:
pip install ludwig Complete info can be found here-> http://ludwig.ai
The Future of AutoML
Basically, the motivation behind AutoML is to computerize the monotonous tasks like pipeline creation and hyperparameter tuning with the goal that information researchers can really invest a greater amount of their energy in the business issue nearby.
AutoML additionally expects to make the innovation accessible to everyone as opposed to a chosen few. AutoML and data researchers can work related to quicken the ML procedure with the goal that the genuine viability of AI can be used.
Regardless of whether AutoML turns into a triumph depends fundamentally on its selection and the progressions that are made in this area. In any case, obviously AutoML is a major piece of things to come of AI.
- Feurer, Matthias, Aaron Klein, Katharina Eggensperger, Jost Tobias Springenberg, Manuel Blum, and Frank Hutter 2015. “Efficient and Robust Automated Machine Learning.” NIPS 2015. https://ml.informatik.uni-freiburg.de/papers/15-NIPS-auto-sklearn-preprint.pdf
- Balaji, Adithya and Alexander Allen. 2018. “Benchmarking Automatic Machine Learning Frameworks.” https://arxiv.org/pdf/1808.06492.pdf.
- Bergstra, James, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. “Algorithms for Hyper-Parameter Optimization.” NIPS 2011. https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf