Mark Litwintschik has taken a large open source data set (1.1 billion taxi rides with data storage on the order of hundreds of gigabytes) and ran some benchmark queries on a variety of different systems. Perhaps the most humble of these systems is a cluster of three Raspberry Pi computers. This webpage talks about how he set up the software on this cluster.

Mark Litwinktschik. 1.1 Billion Taxi Rides with Spark 2.2 & 3 Raspberry Pi 3 Model Bs. September 17, 2017. Available in html format.

This Recommendation was added to the website on 2018-10-23 and was last modified on 2020-02-29. You can find similar pages at Cluster computing.

An earlier version of this page appears here.