Pivotal HAWQ Advanced Analytics Engine Released as Open Source Contribution

Pivotal, an enterprise application development solutions provider, has announced the contribution of HAWQ and MADlib to the Apache Software Foundation. HAWQ and MADlib are now 100% open source and available to the open source community as ASF incubator projects. Pivotal has also partnered with Hortonworks and Altiscale to provide commercial versions of the HAWQ analytics engine.


Apache HAWQ (incubating) is a Hadoop-native advanced SQL analytics database solution that allows enterprises to perform in-database analytics that utilize algorithms provided by the Apache MADlib (incubating) machine learning library. Apache HAWQ includes a variety of advanced capabilities such as SQL-based data analysis, TPC-DS specification compliance, data federation, linear scalability, and Hortonworks Data Platform support. Apache HAWQ also features an API that allows new services and formats to be added without changing the platform.

Apache MADlib (incubating) is an open source library that provides supervised and unsupervised machine learning methods for structured and unstructured data. Apache MADlib is designed for scalable in-database analytics and provides parallel machine learning capabilities such as classification, regression modeling, clustering, topic modeling, association rule mining, descriptive statistics, and more. MADlib has actually been an open source project for several years now and was developed by Pivotal in conjunction with customers as well as researchers from University of California, Berkeley, Stanford University, and University of Florida.

Gavin Sherry, vice president and CTO, data, Pivotal, told ProgrammableWeb that the availability of the HAWQ and MADlib platform as open source will help enterprises enable greater innovation at a much faster pace. Most enterprises currently use proprietary legacy database configurations that do not allow much in the way of innovation. Michael Cucchi, sr. director of outbound product, Pivotal, told ProgrammableWeb that "it's Pivotal's case that open source technologies are driving innovation faster and with more vigor than legacy systems, the backbone of the Facebooks, Googles, and other internet giants who are able to scale to meet changing business dynamics."

Sherry explained that HAWQ differs from the Oracle Data Mining platform in that Oracle is a proprietary platform that does not run native on Hadoop. Providing HAWQ as a 100% open source platform allows organizations to use HAWQ as they see fit easily extending the platform as needed. Sherry also said that organizations can inspect the HAWQ source code and that 2 million lines of code have been contributed to the Apache Software Foundation. The HAWQ platform includes parallel machine learning capabilities provided by Apache MADlib and also allows users to build and implement their own machine learning algorithms.

"Hadoop native technologies, like Apache HAWQ, will enable more companies and people to derive analytics and learning/meaning from big data, something essential for businesses that are moving towards being software-driven," said Cucchi. "This type of functionality was primarily seen only in a small number of organizations with teams of data scientists. This expansion of the Hadoop ecosystem vision is to bring this to all and the world is starting to take note."

Pivotal will continue to provide commercial versions of Apache HAWQ and Apache MADlib which are available as part of the Pivotal Big Data Suite. Pivotal partners Hortonworks and Altiscale will be helping the company bring commercial versions of the platforms to the market.

For more information about Apache HAWQ and Apache MADlib, visit the Apache Software Foundation website. To learn more about the commercial product, Pivotal HAWQ powered by Apache Hawk, visit the Pivotal company website.

Janet Wagner is a freelance technical writer and contributor to ProgrammableWeb covering breaking news, in-depth analysis, and product reviews. She specializes in creating well-researched, in-depth content about APIs, machine learning, deep learning, computer vision, analytics, GIS/maps, and other advanced technologies.