Spark Redis Receiver

Many developers use Redis as a messaging queue. Redis is fast, easy to use, and very popular in production. Moreover, Redis Clusters can be utilized since Redis 3.0. In this post we present our new Spark Redis Receiver. The receiver was developed within one of our ongoing projects and is now publicly available as an open source software on Github.


Redis

Redis is an in-memory key-value store that provides multiple kinds of data-structures as the value. These include plain strings, lists, sets, sorted-sets, hashes and much more [1]. Most of Redis data-structures can be used to support the use case of a messaging queue; for example, list or set items can be continuously pushed by the stream producers and popped by the stream receivers. Redis is fast, easy to use and deploy, and recently supports clustering. However, the downside is that all Redis data-structures are mutable; for example, you can change the value of an item within a list after it has been produced, which is typically not desired in a messaging queue scenario. This also implies that various optimization and fault tolerance guarantees (and benefits) that could have been achieved from immutability, do not exist.

Spark Streaming

Fortunately, Spark Streaming can receive streaming data from any arbitrary data source beyond the ones which it has built-in support for. We only had to reliably implement a receiver class that is customized to receiving the data from Redis.

API

All the details about how to obtain and use the API are documented on our respective GitHub page . We highly encourage feedback and contribution. In the near future we plan to provide writing functionality from Spark Streaming to Redis.

More about implementing Custom Receivers

  1. http://spark.apache.org/docs/latest/streaming-custom-receivers.html
  2. https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/CustomReceiver.scala
Back to Big Data Services