Cascading Open Source Development Framework Adds Support for Hadoop 2.0

As an alternative to the standard Java API the Cascading open source project has been steadily gaining momentum among developers of Big Data applications largely because Cascading makes it easier to isolate the data processing and data integration elements of an application.

This week Concurrent announced that the Cascading application development framework that it developed now supports version 2.0 of Hadoop, including YARN, the latest version of the MapReduce interface for Hadoop.

Concurrent CEO Gary Nakamura says Cascading provides a layer of abstraction over Hadoop that makes it easier to develop Big Data applications. The goal, says Nakamura, is to give developers a simpler way to develop applications on Hadoop without have to master all the intricacies of the MapReduce interface that was originally developed alongside Hadoop.

In the same vein Concurrent this week also announced the general availability of Cascading Lingual, an open source project based on Cascading that provides an ANSI compatible implementation of SQL on top of Hadoop. Since most IT organizations are already familiar with SQL, Nakamura says Cascading Lingual is intended to make existing SQL applications capable of invoking Hadoop without having to rely on slower alternatives such as Hive or Pig.

While many organizations tend to view Hadoop as an inexpensive way to store lots of data that can then be extracted into other applications environments when needed, there is a rapidly growing number of developers looking to build applications directly on top of Hadoop. The challenge that many of these developers face is that the programming tools for Hadoop are not yet mature. For that reason Concurrent presented Cascading as a robust alternative for building applications using many of the same constructs that Java developers are already familiar with.

It remains to be seen whether Hadoop will emerge as platform for application development in the enterprise. But distributors of Hadoop such as Cloudera are determined to develop a platform that can support both batch and real-time processing, which is one reason Cloudera is now so focused on creating a development community.

Cascading, of course, will be only one of several environments for building those applications. The one thing that is for certain is that developers tend to go where the data is. If Hadoop can make large amounts of raw data available without requiring massive amounts of database technology to access, than chances are Hadoop will become developers' platform of choice for creating applications without the need for anything more than a basic Hadoop cluster to get started.

Michael Vizard

Comments (0)