Apache Spark is a fast cluster computing system. The tool enables developers to quickly write programs in Python, Java, and Scala that access a unified processing engine in order to process large amounts of data. Spark supports interactive queries with SQL, machine learning, and graph computation all handled through the Spark API. Sponsored by the Apache Software Foundation, Spark support offers a user mailing list, schedules Spark meetups in the San Francisco Bay Area, and allows contributions from developers within the Apache Spark community.
Apache recently released the first 2.x line of Apache Spark. Of the many updates that improve performance, a number of API changes tell the story of the move to 2.0. A new Structured Streaming API and the consolidation of existing APIs provide new capabilities and simplified use.