Folding over a lazy sequence with Clojure reducers

Clojure 1.5 includes reducers. They’re currently experimental, but showing great promise.

One of reducers’ features is that they allow parallel fold operations. Sadly, although fold works when given a lazy sequence, it falls back to a sequential reduce.

There’s no theoretical reason why fold couldn’t work in parallel with a lazy sequence, so I decided to see if I could implement it. It turns out to be very easy. I’ve implemented a function called foldable-seq that takes a lazy sequence and turns it into something that can be folded in parallel. I’ve checked an example program that uses this to count words in a Wikipedia XML dump into GitHub. The code for foldable-seq is here.

On my 4-core MacBook Pro, the example program runs in approx, 40 seconds without foldable-seq and 13 seconds with.

Benchmarking Producer/Consumer in Akka, Part 2

Recently, I published some benchmarks of Producer/Consumer in Akka. This post is a followup with more detail.

I have modified my test program (which is available on GitHub) to implement three different basic algorithms:

  • Producer pushes to a bounded queue (i.e. producer blocks when the queue is full)
  • Producer pushes to an unbounded queue, plus flow control to ensure that the queue doesn’t grow too large
  • Consumer pulls

For each of these, I then benchmarked different variants as follows:

push_bal: Producer pushes to a bounded queue, with a BalancingDispatcher

push_rr: Producer pushes to a bounded queue, with a RoundRobinRouter

push_sm: Producer pushes to a bounded queue, with a SmallestMailboxRouter

flow_bal: Producer pushes to an unbounded queue, with a BalancingDispatcher

flow_rr: Producer pushes to an unbounded queue, with a RoundRobinRouter

flow_sm: Producer pushes to an unbounded queue, with a SmallestMailboxRouter

pull_1: Consumer pulls, with a batch size of 1

pull_10: Consumer pulls, with a batch size of 10

pull_20: Consumer pulls, with a batch size of 20

pull_50: Consumer pulls, with a batch size of 50

pull_cached: Consumer pulls, with an intermediate cache

Here is a spreadsheet of the results, including graphs, I get on my MacBook Pro (Core i7, 4 cores, 2 hyperthreads per core). Each test is run multiple times and the results averaged.

Updated results, incorporating suggestions from Viktor can be downloaded here.

Conclusions

Note: These conclusions are different from the initial version of this post, following suggestions from Viktor.

The headline is that the difference between the approaches is so small as to be almost irrelevant:

  • For the push to a bounded queue version, BalancingDispatcher clearly outperforms both RoundRobinRouter and SmallestMailboxRouter, especially as the number of workers increases. Strangely the same difference is not present for the unbounded queue version.
  • The pull approach can be made to perform virtually identically to the push model by batching queries.

Benchmarking Producer/Consumer in Akka

Further to this conversation on the Akka mailing list, I decided to benchmark various different approaches to implementing the producer/consumer pattern.

I wanted to choose a “real” problem, so I decided to count the words on the first 100,000 pages of Wikipedia. The producer parses the Wiki XML dump and the words are counted page-by-page by a pool of consumers.

I implemented three different approaches – producer pushes to a bounded queue, producer pushes to an unbounded queue together with a flow control protocol, and consumer pulls.

The source code for the different implementations is here, and the results are at the bottom of this message.

Some observations:

  1. I only timed to the nearest second as I see an approx. 3 second variation from run to run with identical parameters. I’m not sure why I see such a large variation—suggestions welcome.
  2. There’s basically no difference between the two “producer pushes” implementations. The “consumer pulls” implementation is much slower, however
  3. I tried both Dispatcher and BalancingDispatcher and RoundRobinRouter and SmallestMailboxRouter in the producer pushes implementations—the differences were too small to measure

I’m surprised that there is so much difference between producer pushes and consumer pulls. It’s quite possible that I’ve done something stupid in the implementation—I’d be very grateful for a pointer to what it is.

Here are the results (all on my i7 MacBook Pro—4 cores, 2 hyperthreads per core).

Bounded Unbounded Pull
Consumers Seconds Speedup Seconds Speedup Seconds Speedup
1 100 1.00 103 0.97 156 0.64
2 55 1.82 56 1.79 88 1.14
3 43 2.33 40 2.50 65 1.54
4 39 2.56 38 2.63 55 1.82
5 37 2.70 37 2.70 51 1.96
6 34 2.94 33 3.03 44 2.27
7 33 3.03 33 3.03 47 2.13
8 33 3.03 35 2.86 46 2.17
9 33 3.03 32 3.13 46 2.17

ScalaMock3 step-by-step

This post describes ScalaMock 3, which supports Scala 2.10 only. For ScalaMock 2, which supports earlier Scala versions, go here.

This post describes how to setup a project that uses ScalaMock in conjunction with ScalaTest and sbt. The sample code described in this article is available on GitHub.

The example assumes that we’re writing code to control a mechanical turtle, similar to that used by Logo programs. Mocking is useful in this kind of situation because we might want to create tests that function even if we don’t have the hardware to hand, which run more quickly than would be the case if we ran on real hardware, and where we can use mocks to simulate errors or other situations difficult to reproduce on demand.

  1. Create a root directory for your project:
    $ mkdir myproject[/soucecode]
    </li>
    	<li>Create <code>build.sbt</code> containing:
    1organization := "com.example"
    
    version := "1.0"
    
    scalaVersion := "2.10.0"
    
    scalacOptions ++= Seq("-deprecation", "-unchecked")
    
    libraryDependencies +=
      "org.scalamock" %% "scalamock-scalatest-support" % "3.0" % "test"
  2. Now we’ve got a project, we need some code to test. Let’s start with a simple trait representing a turtle. Create src/main/scala/Turtle.scala containing:
    package com.example
    
    trait Turtle {
      def penDown()
      def penUp()
      def forward(distance: Double)
      def turn(angle: Double)
      def getPosition: (Double, Double)
      def getAngle: Double
    }
  3. The turtle API is not very convenient, we have no way to move to a specific position, instead we need to work out how to get from where we are now to where we want to get by calculating angles and distances. Here’s some code that draws a line from a specific point to another by doing exactly that.Create src/main/scala/Controller.scala containing:
    package com.example
    
    import scala.math.{atan2, sqrt}
    
    class Controller(turtle: Turtle) {
    
      def drawLine(start: (Double, Double), end: (Double, Double)) {
        moveTo(start)
    
        val initialAngle = turtle.getAngle
        val deltaPos = delta(start, end)
    
        turtle.turn(angle(deltaPos) - initialAngle)
        turtle.penDown
        turtle.forward(distance(deltaPos))
      }
    
      def delta(pos1: (Double, Double), pos2: (Double, Double)) =
        (pos2._1 - pos1._1, pos2._2 - pos1._2)
    
      def distance(delta: (Double, Double)) =
        sqrt(delta._1 * delta._1 + delta._2 * delta._2)
    
      def angle(delta: (Double, Double)) =
        atan2(delta._2, delta._1)
    
      def moveTo(pos: (Double, Double)) {
        val initialPos = turtle.getPosition
        val initialAngle = turtle.getAngle
    
        val deltaPos = delta(initialPos, pos)
    
        turtle.penUp
        turtle.turn(angle(deltaPos) - initialAngle)
        turtle.forward(distance(deltaPos))
      }
    }
  4. We can now write a test. We’ll create a mock turtle that pretends to start at the origin (0, 0) and verifies that if we draw a line from (1, 1) to (2, 1) it performs the correct sequence of turns and movements.

    Turtle diagram

    Create src/test/scala/ControllerTest.scala containing:

    package com.example
    
    import org.scalatest.FunSuite
    import org.scalamock.scalatest.MockFactory
    import scala.math.{Pi, sqrt}
    
    class ControllerTest extends FunSuite with MockFactory {
    
      test("draw line") {
        val mockTurtle = mock[Turtle]
        val controller = new Controller(mockTurtle)
    
        inSequence {
          inAnyOrder {
            (mockTurtle.penUp _).expects()
            (mockTurtle.getPosition _).expects().returning(0.0, 0.0)
            (mockTurtle.getAngle _).expects().returning(0.0)
          }
          (mockTurtle.turn _).expects(~(Pi / 4))
          (mockTurtle.forward _).expects(~sqrt(2.0))
          (mockTurtle.getAngle _).expects().returning(Pi / 4)
          (mockTurtle.turn _).expects(~(-Pi / 4))
          (mockTurtle.penDown _).expects()
          (mockTurtle.forward _).expects(1.0)
        }
    
        controller.drawLine((1.0, 1.0), (2.0, 1.0))
      }
    }

    This should (hopefully!) be self-explanatory, with one possible exception. The tilde (~) operator represents an epsilon match, useful for taking account of rounding errors when dealing with floating-point values.

  5. Run the tests with sbt test:
    $ sbt
    > test
    [info] ControllerTest:
    [info] - draw line
    [info] Passed: : Total 1, Failed 0, Errors 0, Passed 1, Skipped 0

ScalaMock 3.0-M4 for Scala 2.10.0-RC1

ScalaMock 3.0-M4 (scaladoc) for Scala 2.10.0-RC1 is now released. It supports:

  • Mock functions, traits and classes
  • Both expectation-first and record-then-verify (Mockito-style) mocking
  • ScalaTest and Specs2

To use with sbt and ScalaTest:

libraryDependencies +=
  "org.scalamock" % "scalamock-scalatest-support_2.10.0-RC1" % "3.0-M4"

or for Specs2:

libraryDependencies +=
  "org.scalamock" % "scalamock-specs2-support_2.10.0-RC1" % "3.0-M4"

For background information, see ScalaMock 3.0 Preview Release.

Known limitations (these should all be fixed when Scala adds support for mock types):

  • No support for mocking object creation (constructors)
  • No support for mocking singleton/companion objects
  • No support for mocking final classes or classes with private constructors
  • No support for mocking concrete vars
  • Limited support for overloaded methods
  • No support for mocking Java methods with repeated parameters

ScalaMock 3.0-M1

ScalaMock 3.0-M1 (scaladoc) for Scala 2.10.0-M4 is now released. It supports:

  • Mock functions, traits and classes
  • Both expectation-first and record-then-verify (Mockito-style) mocking
  • ScalaTest and Specs2

To use with sbt:

libraryDependencies +=
  "org.scalamock" %% "scalamock-scalatest-support" % "3.0-M1"

For background information, see ScalaMock 3.0 Preview Release.

It’s been tested against overloaded, curried and polymorphic methods, by-name parameters, path-dependent types and type projections. There may still be corner cases that aren’t handled correctly, please report a bug if you find one.

ScalaMock 3.0 Preview Release

Update: ScalaMock 3.0-M4 for Scala 2.10.0-RC1 is now released.

The current version of ScalaMock makes use of a compiler plugin to generate typesafe mock objects. This works, but has a number of problems, not least of which is the complicated nature of the build system required (meaning that it currently only works with sbt). The recent preview release of Scala 2.10 has opened up a new approach via its support for macros.

I’ve just released a preview of ScalaMock 3.0, a “from the ground up” rewrite using macros. It’s not yet as capable as ScalaMock 2.x (it can only mock traits – there’s no support for mocking classes, singleton/companion objects or object creation yet) but it’s certainly complete enough to be useful, I hope.

One significant (and frequently requested!) enhancement over ScalaMock 2.x is support for Mockito-style “record then verify” mocking as well as the more traditional “setup expectations” style.

The source is available on GitHub, and a snapshot release (Scala 2.10.0-M3 only) is available on Sonatype:

libraryDependencies +=
  "org.scalamock" %% "scalamock-scalatest-support" % "3.0-SNAPSHOT"

You can see examples of it in use in GitHub.

Here’s an example of a test written using the “setup expectations” style:

class ControllerTest extends FunSuite with MockFactory {

  test("draw line") {
    val mockTurtle = mock[Turtle]
    val controller = new Controller(mockTurtle)

    inSequence {
      inAnyOrder {
        (mockTurtle.penUp _) expects ()
        (mockTurtle.getPosition _) expects () returning (0.0, 0.0)
        (mockTurtle.getAngle _) expects () returning 0.0
      }
      (mockTurtle.turn _) expects ~(Pi / 4)
      (mockTurtle.forward _) expects ~sqrt(2.0)
      (mockTurtle.getAngle _) expects () returning Pi / 4
      (mockTurtle.turn _) expects ~(-Pi / 4)
      (mockTurtle.penDown _) expects ()
      (mockTurtle.forward _) expects 1.0
    }

    controller.drawLine((1.0, 1.0), (2.0, 1.0))
  }
}

and here’s the same example rewritten to use the Mockito-style record and then verify approach:

class ControllerTest extends FunSuite with MockFactory {
  import scala.language.postfixOps

  test("draw line") {
    val mockTurtle = stub[Turtle]
    val controller = new Controller(mockTurtle)

    inSequence {
      inAnyOrder {
        (mockTurtle.getPosition _) when () returns (0.0, 0.0)
        (mockTurtle.getAngle _) when () returns 0.0 once
      }
      (mockTurtle.getAngle _) when () returns Pi / 4
    }

    controller.drawLine((1.0, 1.0), (2.0, 1.0))

    inSequence {
      (mockTurtle.turn _) verify ~(Pi / 4)
      (mockTurtle.forward _) verify ~sqrt(2.0)
      (mockTurtle.turn _) verify ~(-Pi / 4)
      (mockTurtle.penDown _) verify ()
      (mockTurtle.forward _) verify 1.0
    }
  }
}

Right now, it supports:

  • Mock functions (including higher-order functions)
  • Polymorphic traits
  • Polymorphic methods
  • Curried methods
  • Overloaded methods

Here’s my todo list:

  • Sort out the documentation
  • Take advantage of macro types (type providers) when available in a subsequent Scala release. This will enable a slightly nicer syntax for method mocking:

    mockObject.expects.method(arguments)

    instead of:

    (mockObject.method _) expects (arguments)

  • Mocking classes (as well as traits)
  • Mocking singleton/companion objects
  • Mocking object creation

I’d be very grateful for any feedback, in particular bug reports.



Follow

Get every new post delivered to your Inbox.