Machine learning company H2O has released a new Python API for its Sparkling Water application. Sparkling Water extends H2O's core product, a predictive analytics platform, to Apache Spark, an open-source in-memory big data processing platform that is growing in popularity.
According to H2O, its solution can deliver real-time data scoring and predictions ten times faster than competing solutions. Use cases include ad targeting, fraud detection and customer intelligence. For example, H2O's platform can be used by a sales organization to predict which customers will purchase certain products and which ones will renew services. Or it could be used by an insurance company to detect patterns of fraud.
Turning Developers into Data Scientists
Data science is one of the hottest segments of the technology market today and for good reason: companies are looking to turn their big data into big profits. That requires humans who have the knowledge and skills necessary to translate large sets of information into actionable insights. But data scientists aren't always easy to come by, and running models efficiently across massive sets of data requires increasingly sophisticated tools. H2O is seeking to address both issues by building accessible but still powerful machine learning solutions that developers can more easily integrate into their applications.
For H2O, offering a Python API for its Sparkling Water solution was a no brainer. Python is one of a number of programming languages commonly used in machine learning applications, so to spur adoption of its offering, the company sought to make sure it has a Python API that plays nicely with other tools in the machine learning toolbox. As H2O's co-founder and CEO SriSatish Ambati noted, "Python is like the jazz movement in machine learning to R is like classical music. With Sparkling Water developers can use packages in Scikit, H2O, Spark SQL, ML, packages in R and Pandas in a unified polyglot experience – all without having to rebuild the pipelines and data workflows in each environment."
Not surprisingly, H2O is just one of a number of companies looking to make a splash in this space. For instance, Databricks, a company whose founders created Apache Spark, today announced a new API that aims to help developers familiar with single machine data processing more easily adopt Spark's parallel data processing.
The growing number of tools available to developers and data analysts is good news for the big data movement. After all, having invested significantly in amassing troves of data, businesses are going to start looking for a return on investment sooner than later. If companies like H2O and Databricks can deliver the solutions that help them find it, big data's most exciting days may be ahead.