Flare: Scale up Spark SQL with native compilation and set your data on fire! Spark performance on SQL and DataFrame/DataSet workloads has made impressive progress thanks to Catalyst and Tungsten, but there is still a significant gap towards what is achievable by best-of-breed query engines or hand-written low-level C code, on modern server-class hardware. We present Flare, a new experimental back-end for Spark SQL that yields significant speedups by compiling Catalyst query plans to native code. Flare’s low-level implementation takes full advantage of native execution, using techniques such as NUMA-aware scheduling and data layouts to leverage ‘mechanical sympathy’ and bring execution closer to the metal than current JVM-based techniques on big memory machines. Thus, with available memory increasingly in the TB range, Flare makes scale-up on server-class hardware an interesting alternative to scaling out across a cluster, especially in terms of data center costs. This talk will discuss the design of Flare and will demonstrate experiments on standard SQL benchmarks that exhibit order of magnitude speedups over Spark 2.0.
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! (Tiark Rompf) Posted on May 16, 2018