Archive for the 'Software Engineering' Category

Travis, Docker and Elastic Beanstalk

Elastic Beanstalk now supports Docker, which is a wonderful thing. I’ve just spent an unreasonably long time trying to work out how to get this working nicely with a CI system running on Travis, though, and thought I’d save others the same experience.

My first thought was that I should get Travis to build a Docker image, but this is a non-starter because Travis runs on Ubuntu 12.04 LTS, which requires a kernel upgrade to be able to run Docker. Even if this did work, it wouldn’t be a great solution—downloading a base Ubuntu image after every build would be dreadfully wasteful.

The “aha” moment came when I realised that you don’t need to create a Docker image to deploy to Elastic Beanstalk. Instead, you can have Elastic Beanstalk build the image for you. So all Travis needs to do is create a Zip archive containing a Dockerfile and any build artefacts it references. This archive can be deployed to, say, S3 and Elastic Beanstalk takes things from there.

Very slick.

Pronking for programmers

If you watch wildlife documentaries, you’ve probably seen pronking, or stotting—the strange four-legged jump that gazelles do:

Pronking is Afrikaans for “showing off,” and that’s certainly part of what’s going on, but it’s not quite as simple as that. Gazelles don’t only pronk to show off—they also pronk when they’re being chased by a predator. I always assumed that this was about evading the predator by being unpredictable, but it turns out that there’s a rather surprising alternative explanation.

Pronking slows the animal down and uses up energy that (you might think) would be better spent running away. But it works—animals that pronk are less likely to be caught than those that don’t. So what’s going on?

The theory is that pronking is a way for a gazelle to demonstrate that it’s fit. “Don’t bother chasing me—I’ll be too difficult to catch. Chase my cousin over there, the one with the limp—he’ll be much easier to catch.”

Pronking works because it’s expensive. If every gazelle could pronk, predators would pay no attention. But only fit gazelles can, so it’s an honest signal.

So what does this have to do with programmers? Too many pizzas and donuts mean that quite a few of us won’t be able to literally pronk, but that doesn’t stop us from doing it in our own way…

I think that we pronk when we contribute to open source.

There are lots of reasons to contribute to open source. It’s a great way to improve the tools we all rely on, give back to the community, and learn new skills. But, let’s be honest, it’s also a great way to show off and demonstrate our fitness.

And it works—when I find myself sifting through a pile of CVs, evidence of significant open source contributions is virtually guaranteed to make sure a CV doesn’t find itself on the “rejects” pile. And like pronking, it works because it’s expensive—the time and effort required is concrete evidence that the candidate cares about their craft.

So if you can pronk, perhaps you should?

ScalaMock Status Report

Apart from a little work to make it compatible with the most recent versions of Scala and ScalaTest (thanks to Chua Chee Seng, Duncan Crawford and Chris Birchall) ScalaMock has been moribund over the last 12 months. This is partly because my focus has been on writing Seven Concurrency Models in Seven Weeks, but mostly because of insurmountable issues with Scala’s current macro support.

There is good news on both fronts, however. First, I recently finished the book, so I should now have more time to devote to ScalaMock. And second, Eugene has announced Palladium, which looks like it should provide everything that ScalaMock needs.

So where is ScalaMock now? And where is it (hopefully) going?

Where is ScalaMock today?

ScalaMock 3 supports two types of mock, proxy mocks and macro mocks.

Proxy mocks are not type checked and have a slightly less convenient syntax than macro mocks, but they work with any trait and are fully supported.

Macro mocks are type checked and have a nicer syntax than proxy mocks. They work well with most traits, but fail for traits with “complex” type signatures.

The good news is that macro mocks either work or give a compile error (there should be no situations where a macro mock compiles but gives “odd” results). And if you have a trait for which macro mocks don’t work, you can always use a proxy mock instead.

Where is ScalaMock going?

As soon as Palladium is available, I plan to start working on ScalaMock 4. If Palladium delivers on its promise, ScalaMock 4 should be able to mock any trait, no matter how complex its type. In addition, I expect that it will also support:

  • Improved syntax:

    mockObject.expects.method(arguments)

    instead of:

    (mockObject.method _) expects (arguments)
  • Mocking object creation (constructors)
  • Mocking singleton and companion objects (static methods)
  • Mocking final classes and classes with final methods or private constructors

Folding over a lazy sequence with Clojure reducers

Clojure 1.5 includes reducers. They’re currently experimental, but showing great promise.

One of reducers’ features is that they allow parallel fold operations. Sadly, although fold works when given a lazy sequence, it falls back to a sequential reduce.

There’s no theoretical reason why fold couldn’t work in parallel with a lazy sequence, so I decided to see if I could implement it. It turns out to be very easy. I’ve implemented a function called foldable-seq that takes a lazy sequence and turns it into something that can be folded in parallel. I’ve checked an example program that uses this to count words in a Wikipedia XML dump into GitHub. The code for foldable-seq is here.

On my 4-core MacBook Pro, the example program runs in approx, 40 seconds without foldable-seq and 13 seconds with.

Benchmarking Producer/Consumer in Akka, Part 2

Recently, I published some benchmarks of Producer/Consumer in Akka. This post is a followup with more detail.

I have modified my test program (which is available on GitHub) to implement three different basic algorithms:

  • Producer pushes to a bounded queue (i.e. producer blocks when the queue is full)
  • Producer pushes to an unbounded queue, plus flow control to ensure that the queue doesn’t grow too large
  • Consumer pulls

For each of these, I then benchmarked different variants as follows:

push_bal: Producer pushes to a bounded queue, with a BalancingDispatcher

push_rr: Producer pushes to a bounded queue, with a RoundRobinRouter

push_sm: Producer pushes to a bounded queue, with a SmallestMailboxRouter

flow_bal: Producer pushes to an unbounded queue, with a BalancingDispatcher

flow_rr: Producer pushes to an unbounded queue, with a RoundRobinRouter

flow_sm: Producer pushes to an unbounded queue, with a SmallestMailboxRouter

pull_1: Consumer pulls, with a batch size of 1

pull_10: Consumer pulls, with a batch size of 10

pull_20: Consumer pulls, with a batch size of 20

pull_50: Consumer pulls, with a batch size of 50

pull_cached: Consumer pulls, with an intermediate cache

Here is a spreadsheet of the results, including graphs, I get on my MacBook Pro (Core i7, 4 cores, 2 hyperthreads per core). Each test is run multiple times and the results averaged.

Updated results, incorporating suggestions from Viktor can be downloaded here.

Conclusions

Note: These conclusions are different from the initial version of this post, following suggestions from Viktor.

The headline is that the difference between the approaches is so small as to be almost irrelevant:

  • For the push to a bounded queue version, BalancingDispatcher clearly outperforms both RoundRobinRouter and SmallestMailboxRouter, especially as the number of workers increases. Strangely the same difference is not present for the unbounded queue version.
  • The pull approach can be made to perform virtually identically to the push model by batching queries.

Benchmarking Producer/Consumer in Akka

Further to this conversation on the Akka mailing list, I decided to benchmark various different approaches to implementing the producer/consumer pattern.

I wanted to choose a “real” problem, so I decided to count the words on the first 100,000 pages of Wikipedia. The producer parses the Wiki XML dump and the words are counted page-by-page by a pool of consumers.

I implemented three different approaches – producer pushes to a bounded queue, producer pushes to an unbounded queue together with a flow control protocol, and consumer pulls.

The source code for the different implementations is here, and the results are at the bottom of this message.

Some observations:

  1. I only timed to the nearest second as I see an approx. 3 second variation from run to run with identical parameters. I’m not sure why I see such a large variation—suggestions welcome.
  2. There’s basically no difference between the two “producer pushes” implementations. The “consumer pulls” implementation is much slower, however
  3. I tried both Dispatcher and BalancingDispatcher and RoundRobinRouter and SmallestMailboxRouter in the producer pushes implementations—the differences were too small to measure

I’m surprised that there is so much difference between producer pushes and consumer pulls. It’s quite possible that I’ve done something stupid in the implementation—I’d be very grateful for a pointer to what it is.

Here are the results (all on my i7 MacBook Pro—4 cores, 2 hyperthreads per core).

Bounded Unbounded Pull
Consumers Seconds Speedup Seconds Speedup Seconds Speedup
1 100 1.00 103 0.97 156 0.64
2 55 1.82 56 1.79 88 1.14
3 43 2.33 40 2.50 65 1.54
4 39 2.56 38 2.63 55 1.82
5 37 2.70 37 2.70 51 1.96
6 34 2.94 33 3.03 44 2.27
7 33 3.03 33 3.03 47 2.13
8 33 3.03 35 2.86 46 2.17
9 33 3.03 32 3.13 46 2.17

ScalaMock3 step-by-step

This post describes ScalaMock 3, which supports Scala 2.10 only. For ScalaMock 2, which supports earlier Scala versions, go here.

This post describes how to setup a project that uses ScalaMock in conjunction with ScalaTest and sbt. The sample code described in this article is available on GitHub.

The example assumes that we’re writing code to control a mechanical turtle, similar to that used by Logo programs. Mocking is useful in this kind of situation because we might want to create tests that function even if we don’t have the hardware to hand, which run more quickly than would be the case if we ran on real hardware, and where we can use mocks to simulate errors or other situations difficult to reproduce on demand.

  1. Create a root directory for your project:
    $ mkdir myproject[/soucecode]
    </li>
    	<li>Create <code>build.sbt</code> containing:
    1organization := "com.example"
    
    version := "1.0"
    
    scalaVersion := "2.10.0"
    
    scalacOptions ++= Seq("-deprecation", "-unchecked")
    
    libraryDependencies +=
      "org.scalamock" %% "scalamock-scalatest-support" % "3.0" % "test"
  2. Now we’ve got a project, we need some code to test. Let’s start with a simple trait representing a turtle. Create src/main/scala/Turtle.scala containing:
    package com.example
    
    trait Turtle {
      def penDown()
      def penUp()
      def forward(distance: Double)
      def turn(angle: Double)
      def getPosition: (Double, Double)
      def getAngle: Double
    }
  3. The turtle API is not very convenient, we have no way to move to a specific position, instead we need to work out how to get from where we are now to where we want to get by calculating angles and distances. Here’s some code that draws a line from a specific point to another by doing exactly that.Create src/main/scala/Controller.scala containing:
    package com.example
    
    import scala.math.{atan2, sqrt}
    
    class Controller(turtle: Turtle) {
    
      def drawLine(start: (Double, Double), end: (Double, Double)) {
        moveTo(start)
    
        val initialAngle = turtle.getAngle
        val deltaPos = delta(start, end)
    
        turtle.turn(angle(deltaPos) - initialAngle)
        turtle.penDown
        turtle.forward(distance(deltaPos))
      }
    
      def delta(pos1: (Double, Double), pos2: (Double, Double)) =
        (pos2._1 - pos1._1, pos2._2 - pos1._2)
    
      def distance(delta: (Double, Double)) =
        sqrt(delta._1 * delta._1 + delta._2 * delta._2)
    
      def angle(delta: (Double, Double)) =
        atan2(delta._2, delta._1)
    
      def moveTo(pos: (Double, Double)) {
        val initialPos = turtle.getPosition
        val initialAngle = turtle.getAngle
    
        val deltaPos = delta(initialPos, pos)
    
        turtle.penUp
        turtle.turn(angle(deltaPos) - initialAngle)
        turtle.forward(distance(deltaPos))
      }
    }
  4. We can now write a test. We’ll create a mock turtle that pretends to start at the origin (0, 0) and verifies that if we draw a line from (1, 1) to (2, 1) it performs the correct sequence of turns and movements.

    Turtle diagram

    Create src/test/scala/ControllerTest.scala containing:

    package com.example
    
    import org.scalatest.FunSuite
    import org.scalamock.scalatest.MockFactory
    import scala.math.{Pi, sqrt}
    
    class ControllerTest extends FunSuite with MockFactory {
    
      test("draw line") {
        val mockTurtle = mock[Turtle]
        val controller = new Controller(mockTurtle)
    
        inSequence {
          inAnyOrder {
            (mockTurtle.penUp _).expects()
            (mockTurtle.getPosition _).expects().returning(0.0, 0.0)
            (mockTurtle.getAngle _).expects().returning(0.0)
          }
          (mockTurtle.turn _).expects(~(Pi / 4))
          (mockTurtle.forward _).expects(~sqrt(2.0))
          (mockTurtle.getAngle _).expects().returning(Pi / 4)
          (mockTurtle.turn _).expects(~(-Pi / 4))
          (mockTurtle.penDown _).expects()
          (mockTurtle.forward _).expects(1.0)
        }
    
        controller.drawLine((1.0, 1.0), (2.0, 1.0))
      }
    }

    This should (hopefully!) be self-explanatory, with one possible exception. The tilde (~) operator represents an epsilon match, useful for taking account of rounding errors when dealing with floating-point values.

  5. Run the tests with sbt test:
    $ sbt
    > test
    [info] ControllerTest:
    [info] - draw line
    [info] Passed: : Total 1, Failed 0, Errors 0, Passed 1, Skipped 0


Follow

Get every new post delivered to your Inbox.

Join 241 other followers