Masking the Growing Complexity of Big Data in the Enterprise

Michael Vizard
Oct. 24 2012, 12:01AM EDT

Being able to store and manage massive amounts of Big Data is one thing; finding a way to make it accessible to developers is quite another. Major vendors such as SAP and IBM are in a race to create a framework that will make it simpler for developers to invoke a layer of services that will allow developers to access data regardless of what format it is stored in without having to, for example, learn arcane interfaces such as MapReduce.

Taking a step in that direction, SAP today at the Strata + Hadoop World 2012 conference unveiled an SAP Big Data bundle offering that brings together the SAP High Performance Analytics Appliance, SAP Sybase IQ Server, SAP Data Integrator software and SAP BusinessObject business intelligence software with connectors to Hadoop distributions from Cloudera, Hortonworks, Hewlett-Packard and IBM.

According to Marie Goodel, senior director of product marketing for enterprise information management database and technology, SAP plans to eventually expose a set of data services that will mask the complexity of Hadoop, columnar Sybase IQ databases, relational Sybase datbases and SAP HANA in-memory database from the average developer. In essence, developers will be presented with a menu of choices from which they can access and store data based on the attributes of the data, including its size, criticality to the business and performance requirements, says Goodel.

IBM has a very similar set of ambitions, According to Nancy Kopp, IBM’s Director of Big Data Strategy, IBM, for example, is working on developing a set of APIs for the Vivisimo enterprise search technology that the company acquired earlier this year.

As Big Data begins to proliferate across the enterprise, developers are looking for easier ways to access large amounts of data residing in multiple systems. IBM’s long term goal, say Kopp, is to make that possible by adding a well-defined set of APIs to the Vivisimo platform.

At the IBM Information OnDemand 2012 conference this week IBM launched InfoSphere Streams to analyze data in motion while expanding its portfolio of Big Data offerings to include new offerings that embed analytics inside business processes running in the cloud. It also updates its InfoSphere BigInsights implementation of Hadoop to include the ability to analyze unstructured social media data in addition to tightening the integration between BigInsights and Vivisimo. Finally, IBM also launched Analytics Answers, a cloud-based service based on the company’s predictive analytics software.

IBM is not only provider of enterprise search engine software that is looking to make its technology more accessible to developers. For example, LucidWorks with release 2.1 of the company’s namesake platform earlier this year added REST APIs that automates the integration of search as a service with your application. LucidWorks, which provides an enterprise search platform based on an open source Lucene/Solr enterprise search engine, then took that platform one set further by creating LucidWorks Big Data, a development stack that combines code from open source Hadoop, Mahout, R projects with the Lucene/Solr enterprise search engine.

According to LucidWorks CEO Paul Doscher, the whole point of using REST API is to make any technology more palatable to developers by making it more accessible.

Meanwhile, Attivio, a provider of enterprise search engine software based on proprietary technology, has similarly moved to make its platform more accessible to developers. According to Attivio CEO Ali Riaz, the company’s Active Intelligence Engine platform has single API that supports keyword, Boolean, fuzzy, fielded and relational search, or developers can use a SQL API that supports full-text search via user-defined functions.

While as a technology enterprise search has seen limited adoption in most corporate environments, the rise of Big Data is creating a development and management challenge that requires a more comprehensive approach to automatically pulling huge datasets inside any given applications. Of course, every major vendor, including Oracle and Hewlett-Packard are eying the same opportunities. The one that will most likely win out in that end, however, will be the vendor that makes the most Big Data easily accessible to the applications that ultimately need to invoke and consume that information.

Michael Vizard

Comments