As a source of data, Hadoop has emerged as a “data lake” that almost every developer will eventually need to dive into. To make that easier to do, SnapLogic this week added support for Hadoop 2.0 to the fall release of the SnapLogic Elastic Integration Platform.
As an integration platform-as-a-service (iPaaS) environment, SnapLogic is taking a distributed approach to integrating Hadoop data, says Darren Cunningham, vice president of marketing for SnapLogic. IT organizations can opt to specify where integration processing should occur. For example, some organizations may opt to process that integration in the cloud or within a Hadoop cluster, he says.
That approach allows organizations to bring data integration to wherever the center of data gravity happens to be located, says Cunningham.
In fact, he says organizations can now Hadoop-enable a data flow via a single click. That new capability is specifically designed to allow “citizen integrators” to make use of a Designer tool that runs in the cloud to transform data flows, also known as pipelines, into MapReduce jobs that run on Hadoop.
In addition, the fall release adds support for parsing and formatting additional Hadoop file formats, such as SequenceFile and RCFile and JSON-based document processing for MapReduce jobs.
SnapLogic is also expanding a Hadooplex capability through which developers can now set, schedule and trigger a multisource pipeline to run natively as a YARN application on Hadoop. The SnapLogic Hadooplex now also supports Kerberos authentication for reading and writing HDFS data as well as launching Hadooplex nodes via YARN and SnapReduce pipelines to MapReduce.
Another new capability is a Data Mapper to shape data that adds progressive schema loading, direct schema-to-schema mapping and map-path highlighting to optimize performance and productivity when working with complex schemas.
SnapLogic also added a Hierarchical SmartLinking function to accelerate development and improve the self-learning features within the SnapLogic Elastic Integration Platform. SmartLinking now understands context in JSON and XML documents and adds support for nested hierarchies.
SnapLogic is adding a versioning capability for pipelines. Data flow designers can now see previous versions of a pipeline and replace an existing pipeline with a newer one or roll back to a previous version.
Finally, SnapLogic is adding new or updated connectors, known as snaps, for Expensify, JMS, Oracle, Salesforce.com, SAP, ServiceNow, SFTP, SOAP, Workday, Xactly, File Writer and XML.
Cunningham says the biggest challenge with integration these days isn’t necessarily the technology as much as legacy thinking. Service-oriented architecture implies processing integration centrally. But with the rise of more modern approaches to integration, Cunningham says that integration can now take place almost anywhere.