Apache Spark is a fast cluster computing system supporting interactive queries with SQL, machine learning, and graph computation all handled through the Spark API. The Apache Spark Java Library enables developers to quickly write programs in Java that access a unified engine in order to process large amounts of data. Supported by the Apache Software Foundation, the Java library comes well documented.
Apache recently released the first 2.x line of Apache Spark. Of the many updates that improve performance, a number of API changes tell the story of the move to 2.0. A new Structured Streaming API and the consolidation of existing APIs provide new capabilities and simplified use.