Archive for January, 2013

Benchmarking Producer/Consumer in Akka, Part 2

Recently, I published some benchmarks of Producer/Consumer in Akka. This post is a followup with more detail.

I have modified my test program (which is available on GitHub) to implement three different basic algorithms:

  • Producer pushes to a bounded queue (i.e. producer blocks when the queue is full)
  • Producer pushes to an unbounded queue, plus flow control to ensure that the queue doesn’t grow too large
  • Consumer pulls

For each of these, I then benchmarked different variants as follows:

push_bal: Producer pushes to a bounded queue, with a BalancingDispatcher

push_rr: Producer pushes to a bounded queue, with a RoundRobinRouter

push_sm: Producer pushes to a bounded queue, with a SmallestMailboxRouter

flow_bal: Producer pushes to an unbounded queue, with a BalancingDispatcher

flow_rr: Producer pushes to an unbounded queue, with a RoundRobinRouter

flow_sm: Producer pushes to an unbounded queue, with a SmallestMailboxRouter

pull_1: Consumer pulls, with a batch size of 1

pull_10: Consumer pulls, with a batch size of 10

pull_20: Consumer pulls, with a batch size of 20

pull_50: Consumer pulls, with a batch size of 50

pull_cached: Consumer pulls, with an intermediate cache

Here is a spreadsheet of the results, including graphs, I get on my MacBook Pro (Core i7, 4 cores, 2 hyperthreads per core). Each test is run multiple times and the results averaged.

Updated results, incorporating suggestions from Viktor can be downloaded here.

Conclusions

Note: These conclusions are different from the initial version of this post, following suggestions from Viktor.

The headline is that the difference between the approaches is so small as to be almost irrelevant:

  • For the push to a bounded queue version, BalancingDispatcher clearly outperforms both RoundRobinRouter and SmallestMailboxRouter, especially as the number of workers increases. Strangely the same difference is not present for the unbounded queue version.
  • The pull approach can be made to perform virtually identically to the push model by batching queries.

Benchmarking Producer/Consumer in Akka

Further to this conversation on the Akka mailing list, I decided to benchmark various different approaches to implementing the producer/consumer pattern.

I wanted to choose a “real” problem, so I decided to count the words on the first 100,000 pages of Wikipedia. The producer parses the Wiki XML dump and the words are counted page-by-page by a pool of consumers.

I implemented three different approaches – producer pushes to a bounded queue, producer pushes to an unbounded queue together with a flow control protocol, and consumer pulls.

The source code for the different implementations is here, and the results are at the bottom of this message.

Some observations:

  1. I only timed to the nearest second as I see an approx. 3 second variation from run to run with identical parameters. I’m not sure why I see such a large variation—suggestions welcome.
  2. There’s basically no difference between the two “producer pushes” implementations. The “consumer pulls” implementation is much slower, however
  3. I tried both Dispatcher and BalancingDispatcher and RoundRobinRouter and SmallestMailboxRouter in the producer pushes implementations—the differences were too small to measure

I’m surprised that there is so much difference between producer pushes and consumer pulls. It’s quite possible that I’ve done something stupid in the implementation—I’d be very grateful for a pointer to what it is.

Here are the results (all on my i7 MacBook Pro—4 cores, 2 hyperthreads per core).

Bounded Unbounded Pull
Consumers Seconds Speedup Seconds Speedup Seconds Speedup
1 100 1.00 103 0.97 156 0.64
2 55 1.82 56 1.79 88 1.14
3 43 2.33 40 2.50 65 1.54
4 39 2.56 38 2.63 55 1.82
5 37 2.70 37 2.70 51 1.96
6 34 2.94 33 3.03 44 2.27
7 33 3.03 33 3.03 47 2.13
8 33 3.03 35 2.86 46 2.17
9 33 3.03 32 3.13 46 2.17