As one of the newest machine learning technologies, MonkeyLearn is positioning itself as developer-friendly and easy to integrate. It has learned what developers need most from the growth of predictive analytics and APIs released over the past year or so. ProgrammableWeb spoke with co-founder and CEO Martín Alcalá Rubí about how developers can easily get started with using predictive analytics APIs in their apps.
Launched earlier this month, MonkeyLearn is a new text-mining machine learning tool focused on helping developers easily integrate predictive analytics into their applications and products. After signing up, developers select an ontology module that has already classified key concepts and technical language into a concept map/knowledge base. For greater specificity, developers can also create their own knowledge bases, but the precreated modules are aimed at helping developers get up and started straight away.
Alcalá Rubí explains:
MonkeyLearn has created several pretrained public modules so developers have an easy and fast way to use text-mining tools (like sentiment analysis). These public modules have been already trained with text data to resolve particular text mining tasks, in the most accurate and precise way. Each of the existing public modules are ready to be integrated via a Web API to platforms, applications and websites.
On the other hand, developers can also create their own text mining module from scratch, customized for their needs. Developers can design their own category tree and can upload sample data to train their algorithm, then with MonkeyLearn they can categorize text appropriately.
Following this, developers then test the module against examples of the sort of data they may want to mine for sentiment analysis, to drive a recommendations engine or to identify affinities that can help determine what relevant content to display to the user. During the testing phase, developers can refine the knowledge base module further so as to increase the accuracy of the predictive analysis.
Once the developer is happy with the results, a simple API is available to integrate the machine learning algorithm into any application or solution. A freemium model means that developers can start using the tool immediately and begin paying once they see their own business goals improve (such as higher conversions or greater user engagement).
Defining the Ontology
One of the greatest difficulties developers and businesses have with using predictive analytics is in developing the concept map that a machine learning algorithm needs to use in order to understand the context around any given block of text or user behavior.
Building this knowledge base is often a lot of work and is the second biggest bottleneck for many businesses wanting to integrate machine learning technologies into their value chains.
(Of course, the biggest bottleneck is businesses having to build skills in machine learning algorithms in the first place. As ProgrammableWeb writer Janet Wagner, author Louis Dorard and services like SwiftIQ demonstrate, this problem has largely been solved: Machine learning algorithms are the latest infrastructure-as-a-service offering made available via API.)
So for these algorithms to work successfully, they need to work in conjunction with a conceptual map that documents the knowledge area or context in which the text should be read. For example, to predict what data a call center operator may need to reference based on text mining a support call request, the machine learning algorithm needs to compare the text of the support call request with a knowledge base about the business’ products and common repair faults. Without this context, the prediction is useless.
But creating these knowledge maps is a huge piece of work in itself. Newer machine learning services are trying to help solve this pain point in the way they design their services. MindMeld API’s speech-recognition machine learning tool, for example, uses a data-scraping-like feature to let developers create a knowledge graph based on any data set the business already has or from an online source. Lingo24 (to be covered on ProgrammableWeb later this week) has mapped ontologies in a range of verticals in order to increase its machine learning translation accuracy.
MonkeyLearn’s feature is to provide “pretrained public modules” that map out the knowledge base for particular tasks that may require text mining analysis.
Says Alcalá Rubí:
Since each public module is ready to use, developers can integrate these modules with an app or platform within minutes. Developers do not have to spend time designing the category tree, re-collecting training samples and training their algorithm. All MonkeyLearn users have to do is the integration with their app or platform.
Among others, some of our public modules include language detection, topic classification, professional classifier, affinity profiler, sentiment analysis and spam detection. These modules can be applied in many verticals, from social media monitoring, advertising, news and media, and e-commerce to educational platforms. These verticals are constantly being updated as MonkeyLearn’s core developers are continuously updating our existing public modules and adding new public modules in order to meet user demand and improve the user experience.
While many developers may be interested in the pretrained public modules due to their ease of use, custom modules are great for those developers who have the sample data to train their algorithm and have a very particular need for text mining or text classification. For example, some of our users have made custom modules for event classification, spoiler detection, news classifiers, risk detection, bully detection and more.
New Market Opportunities to Monetize Data Modeling
All of this may open up a new market for businesses that have to date specialized in data mining and data scraping, and for developers to monetize the time they spend building up their own ontologies. With machine learning technologies comes the opportunity to monetize creating the knowledge bases that can be used in conjunction with a predictive analytics algorithm. MonkeyLearn is already thinking this through, says Alcalá Rubí::
In the current version of MonkeyLearn, users cannot share a category tree or module with others on the platform. Having said this, it is on our near-term road map to allow users share their own custom modules with others on the platform and monetize by doing so.
In the future, we imagine a marketplace where developers can create and train awesome text mining modules that do specific things, and sell them to other developers and entrepreneurs who are looking a solution for a particular text mining need.
Members of the MonkeyLearn team are confident that by offering public modules and allowing businesses to create custom knowledge bases, they have created a machine learning tool that can work “in almost any Internet vertical: news and media, e-commerce, advertising, social media, sales and customer support, and education,” says Alcalá Rubí.
A RESTful API
According to the team, customers are using MonkeyLearn for sentiment analysis, news categorization, language detection, event classification, bad quality lead detection, spoiler detection, risk detection and bully detection.
The MonkeyLearn API is completely RESTful, confirms Alcalá Rubí:
MonkeyLearn currently has two public endpoints on our public API, both for classifying text in singular and batch mode, so it is currently a very simple RESTful API.
We're developing some new features that will allow our users to work with samples, or to train a classification module programmatically through the API. We’re also developing clustering and text extraction APIs. We want to continue to add new resources, as the API is growing quickly. Using the REST architecture allows us to grow without making things complex and hard to use. Customers will be able to create train and integrate modules in the fly, enabling them to do cool things like creating one classifier per user.
Developers can sign up and use a freemium account for up to 1,000 queries per month, with a sliding subscription fee scale introduced depending on number of monthly queries made.