BigML, a cloud based machine learning platform that allows users to quickly build predictive models using small or big data, has announced the availability of the BigML Winter Release. The release features faster predictive models with MTree, a new development mode that allows users to run unlimited tasks up to 16MB for free, and new prediction strategies. The BigML Winter Release also includes new BigML API features, including dataset transformations, multi datasets, weighted models and an extra language called Flatline that allows for a new paradigm which BigML calls Programmatic Machine Learning.
The BigML platform makes it possible to build visual predictive models using sources (raw data) or datasets (structured version of a source) which can then be used to generate predictions. Machine learning and predictive models can be used for a wide variety of practical applications such as predicting customer behavior and future sales, targeted advertising, predicting healthcare outcomes, analyzing warehouse inventory levels to optimize purchasing of goods, and many other applications.
According to the BigML Winter Release blog post, more than 600,000 active predictive models have been created using the BigML platform, half of which were created in the last several weeks. ProgrammableWeb reached out to Francisco J Martin, BigML co-founder and CEO, who explained that the recent big increase in active predictive models was due to several factors. One is a dramatic increase in the number of organizations that have realized that cloud-based predictive analytics is the fastest way to get insights from their data. Another factor is the desire of organizations to solve many more problems using machine learning after solving a single problem. The BigML API has also played a role in the increase of active predictive models.
"While users of all skill sets can utilize BigML, we find that more advanced users leverage our REST API to automate many tasks for their machine learning projects," Martin tells ProgrammableWeb.
At the time this was written, there are over 6,000 users of the BigML platform. Martin says the company has identified two main groups of user profiles:
- Data practitioners and business analysts who do not want to waste time with R or money with SAS, and find BigML an easy and robust way to get new insights without the infrastructure chores and costs.
- Developers who want to build predictive applications without having to navigate the long learning curve of machine learning.
What is machine learning?
According to Wikipedia, machine learning is a branch of artificial intelligence concerned with "the construction and study of systems that can learn from data." A definition of machine learning can also be found on the BigML FAQ page:
"Machine learning is a method of programming a computer in which, instead of a human programmer explicitly encoding the desired behavior, the behavior is "learned" through observation by the machine of data related to the desired behavior."
The number of machine learning platforms and APIs has been rapidly rising in recent years. BigML is one of several Machine Learning as a Service (MLaaS) platforms that ProgrammableWeb has written about recently. Other MLaaS platforms and APIs include the Google Prediction API, DatumBox API, Swift IQ and Algorithms.io.
The BigML Winter Release includes many new features and improvements including (but not limited to):
Faster predictive models with MTree
It now takes only one-eighth of the time to build a predictive model than it previously took. Fast, real-time predictions can also be generated using the BigML PredictServer, a dedicated machine image that can be deployed to create fast and reliable predictions using BigML models and ensembles. BigML PredictServer is available via the AWS Marketplace and can be used for real-time scoring and very large batch predictions (millions and upward).
New development mode
BigML users can now run unlimited tasks up to 16MB for free. The new development mode was created to be a framework for users who wanted to practice, teach and learn machine learning or predictive analytics. Development mode has a few limitations when it comes to number of models, number of terms in text analysis and number of nodes in a tree. However, all other features are the same as in production mode. Visit the BigML website for specific details about development mode and production mode.
New prediction strategies
There is now a second strategy, called "proportional," to deal with missing values in input data (the first strategy is called "last prediction"). Users can select the proportional strategy when their input data contains missing values. The proportional strategy option will evaluate all subtrees of a missing split and recombine their predictions based on the proportion of data in each subtree.
New BigML API features
Many new features have been added to the BigML API and dozens of pre-built functions are already available. New API features and functions include (but are not limited to):
BigML has introduced a new algorithm that includes three ways to create weighted models and cope with the problem of imbalanced datasets. Users can establish weight criteria and create models that consider that criteria at building time. The three types of weight criteria users can establish are Weight Field, Objective Weights and Automatic Balancing. Detailed information about each type of weight criteria can be found in the BigML API documentation.
A new API feature introduced in the 2014 Winter Release is the ability to create a dataset using multiple datasets as input. The functionality was added to the API to handle use cases such as the need to combine multiple sources of data into a single dataset or create a web application that collects data in batches. Detailed information about multi datasets can be found in the BigML API documentation.
The BigML API now allows new datasets to be derived from an existing dataset. An existing dataset can now be sampled, filtered, extended with new fields, or concatenated to other datasets in order to create another new dataset. A dataset can actually be sampled, filtered and extended simultaneously with only one API request.
Programmatic Machine Learning
The ability to programmatically transform a dataset via a high-level language and a cloud-based API together is a new paradigm that BigML calls Programmatic Machine Learning. BigML has developed a new Lisp-like language that the company has named Flatline that has two syntactic variants (JSON and Lisp). Flatline is an extra language that now comes with the BigML API that can be used to transform the REST resources programmatically. Martin explains to ProgrammableWeb how Flatline and Programmatic Machine Learning works:
"You can not only create, read, update and delete resources at a high level, but you can also programmatically transform part of those resources ‘on the fly.’ Things like filtering or sampling a dataset, transforming a feature or creating a new feature using a combination of features are now only one API request away. These are the basic components that you can use to compound complex structures—kind of like using Legos."
He explains further:
"Bear in mind that we are talking about resources that require asynchronous processing. There’s a bunch of REST APIs to deal with simple resources (photos, recipes, etc) but only a few asynchronous REST APIs that deal with massive computation massive computation, in a scalable and cost-effective way."
Traditionally, most machine learning tasks require some basic human intervention in nearly every step of the process; from selecting the right features when creating a model, to recreation of the model with other features or other parameters. Martin says there are benefits to performing these steps programmatically:
"Representing each step in the process (dataset, model, evaluation) as a RESTful resource you can analyze the output programmatically and then trigger new processes (also programmatically) that can automatically continue or stop based on some performance measures. As everything runs on the cloud you don’t need to be concerned with storage, memory, CPUs, etc.—just in programming your best strategies. Note that just a few years ago, it would have been unthinkable and/or prohibitively expensive to get access to this kind of computational power."
By Janet Wagner. Janet is a data journalist and full stack developer based in Toledo, Ohio. Her focus revolves around APIs, open data, data visualization, and data-driven journalism. Follow her on Twitter, Google+, and LinkedIn.