Demand for Big Data applications is clearly skyrocketing. However, the challenge facing many developers is that the data needed to create those applications exists in structured, unstructured and semi-structured formats. Looking to make it easier to build these kinds of applications, Hewlett-Packard has added support for both a Java SDK and semi-structured data to the HP Vertica columnar database.
According to Luis Maldonado, director of product management for HP Vertica, version 7.0 of the HP Vertica Analytic Platform makes it possible to store structured and semi-structured data. While at the same time taking advantage of enhanced integration with Hadoop that gives developers access to unstructured data.
Rather than constantly calling out to Hadoop in batch mode, Maldonado says this version of Vertica makes it easier to store semi-structured data, that is pulled from Hadoop within Vertica, using a new HP Vertica Flex Zone capability. That means that semi-structured data that is likely to be repeatedly invoked by multiple applications is always readily available.
For all the hype surround Hadoop, Maldonado notes that organizations have invested billions of dollars housing structured data in a data warehouse that runs on top of a columnar or relational database. While Hadoop makes it feasible to work with massive amounts of raw data, Maldonado says IT organizations want to be able to leverage their investments in existing data warehouses that contain customer information collected over multiple decades alongside Hadoop.
The HP Vertica Analytic Platform facilitates that process by providing support for auto-schematization, which HP says eliminates the need for schemas that normally would have to be defined or applied before the data is loaded. This “one-click” schema capability allows for schemas to be created and applied as needed.
There’s no doubt that Hadoop is transforming the data warehouse as we know it today. But rather than seeing Hadoop as a replacement for other types of databases, Maldonado says the data warehouse of the future is going to be a more federated entity that spans relational, columnar and Hadoop data formats.
HP already provides SDKs that support C/C++ and R programming languages. But with the addition of Java support, HP is opening up Vertica as a platform to a much broader range of developers, many of which do not want to have to master arcane interfaces such as MapReduce to work with semi-structured and unstructured data.
There’s no doubt that Hadoop is transforming the way data is accessed and managed within the enterprise. The temptation that needs to be resisted is the tendency to want to throw the proverbial baby out with the bath water.This is a concern in a world where structured, semi-structured and unstructured data will going forward often need to be invoked with the same application.