Archive for October, 2007

Navigating the equality maze

This article originally appeared upon on texperts.com

Ruby Logo Ruby supports five (yes, five!) different ways to test for object equality. Why does Ruby need five different ways to test equality? What are they? How do they differ? How should they be used? When (and how) should they be implemented or overridden?

The actors take the stage

So, here are our choices:

  1. == (natural equality)
  2. equal? (object identity)
  3. eql? (hash equality)
  4. === (case equality)
  5. <=> (spaceship operator)

Object identity versus natural equality

Let’s start by looking at == and equal?. Object provides default implementations of both, both of which test for object identity (i.e. they return true if and only if an object is compared with itself) and both of which can be overridden.

So, what is the difference? The difference is entirely one of convention; by convention, equal? is not overridden whereas == is.

The intention is that == should be overridden to provide “natural” equality semantics (i.e. whatever you would naturally expect equality to mean in context). Normally this means value semantics in which == returns true if the two objects in question represent the same value, false otherwise. And this is exactly what most of the standard classes do; String for example:

s1 = 'test string'
s2 = 'test string'

s1.equal? s1
=> true
s1.equal? s2
=> false
s1 == s2
=> true

Just about every standard class overrides ==. Array, for example, overrides == to return true if and only if two arrays contain the same number of elements and each element is equal according to its own definition of ==. But it isn’t always as simple as this; the various numeric classes, for example, define == to return mathematically sensible results when comparing values of different types:

1 == 1
=> true
1.equal? 1
=> true
1.equal? 1.0
=> false
1 == 1.0
=> true

Hash Equality

Whereas == normally tests that two objects represent the same value, eql? should always do so. For example:

1 == 1.0
=> true
1.eql? 1.0
=> false

Hash uses eql? to compare values used as hash keys. Why not use ==? Because natural equality isn’t necessarily apropriate for a hash. There are two reasons for this, one philosophical and one practical.

To understand the philosophical issue, consider the following:

h = {}
h[1] = 'an integer'
h[1.0] = 'a float'

If Hash used ==, the second assignment would override the first, which almost certainly isn’t what we meant.

The practical issue relates to how hashes work. For the implementation to work correctly a.eql? b must imply that a.hash == b.hash. How could we possibly guarantee this if Hash used ==, when potentially several unrelated classes are involved?

Object provides a default implementation of eql? which compares object identity. This is almost certainly not what you want.

Although the primary use of eql? is within hashes, because hashes are used extensively throughout Ruby code you can easily end up using it indirectly without necessarily realising that you are doing so. Set, for example, is implemented using a hash internally so comparison for set membership is performed with eql? instead of ==. This can lead to surprising behaviour if you rely on the default implementation provided by Object:

class Foo
  attr_accessor :x
  def initialize(x)
    @x = x
  end
  def ==(other)
    @x == other.x
  end
end

f = Foo.new(1)
g = Foo.new(2)
h = Foo.new(1)

f == g
=> false
f == h
=> true

Set.new [f, g]
=> #, #}>
s.add h
=> #, #, #}>
s.size
=> 3

Case equality

Our next equality operator is ===, the case equality or “threequals” operator. This is the power behind the nice syntactic sugar supported by Ruby’s case statement.

For most objects, === works just like ==, but certain classes modify it to return true for a wider ranger of comparisons. One such class is Range:

(1..10) == 4
=> false
(1..10) === 4
=> true

Which means that you can write this kind of code:

kind = case lines
  when 1..10: "Short"
  when 11..25: "Medium"
  when 26..50: "Long"
  else "Too long!"
end

Regular expressions can play the same kind of trick:

/foo/ == 'foo'
=> false
/foo/ === 'foo'
=> true

kind = case moment
  when /dd:dd:dd/: 'time'
  when /dd/dd/dd/: 'date'
  else 'other'
end

As can classes

String == 'foo'
=> false
String === 'foo'
=> true

case thing
  when String: # Handle strings here
  when Numeric: # Handle numbers here
  # etc...
end

Note that this means that, unlike the other methods we’re considering here, this means that === won’t in general be commutative:

String === 'foo'
=> true
'foo' === String
=> false

Object provides a default implementation of === which returns true if its arguments are identical and otherwise calls ==:

class Foo
  def ==(other)
    puts '== called'
    super
  end
end

f = Foo.new
g = Foo.new

f === f
=> true

f === g
== called
=> false

The Spaceship operator

The spaceship operator does a lot more than simply check for equality. It defines an ordering on your objects, returning -1, 0 or 1 depending on whether the first argument is less than, equal to or greater than the second.

It’s relevant to our discussion here not only because it can be used to test for object equality as follows:

(f  g) == 0

But also because via the Comparable mixin, it provides us with an alternative method of implementing the == operator (although note that, as with the default implementation of ===, it short-circuits if the two objects are identical):

class Foo
  include Comparable
  def (other)
    puts ' called'
    0
  end
end

f = Foo.new
g = Foo.new

f == g
 called
=> true

f == f
=> true

Object does not provide a default implemenation of <=>, but many standard library classes do provide one of their own.

Which method to use?

So, given this smorgasbord of equality methods, which should we use, and when?

In the vast majority of cases, you will either want to test for “natural” equality (==) or object identity (equal?). Only very rarely (possibly never) should you ever need to call === or eql? directly.

Of course, you will use both indirectly whenever you use a case statement or a hash. But there should be very few occasions where you need to use them directly.

You will notice that quite a bit of code “out there” doesn’t necessarily follow the above recommendation. In particular, it has become idomatic to test the class of an object with ===; the standard libraries in particular use this idiom heavily. Personally speakiing, however, this strikes me as excessively “cute” and I would prefer to use is_a? instead:

Integer === 1
=> true
1.is_a? Integer
=> true

Implementing and overriding equality methods

A number of obvious recommendations arise naturally from the above.

  1. Do not override equal?.
  2. eql? should return true if and only if the two objects represent the same value. This means that if you derive directly from Object, you almost certainly should provide your own implementation of eql?.
  3. If you implement eql? you will normally also have to implement hash and must ensure that x.eql? y implies that x.hash == y.hash.
  4. Under most circumstances, == should be an alias for eql?. If, however, a broader definition of equality makes sense, feel free to alter it to provide sensible natural semantics.
  5. If you do decide to broaden the definition of ==, you should ensure that it still behaves mathematically “sensibly”. As a minimum this means ensuring that it remains commutative.
  6. Under most circumstances, you will not need to implement ===. If, however, your class can benefit from the flexibility of Ruby’s case statement, feel free to create your own version of ===.
  7. If you implement <=>, you normally should not need to implement == as it comes “for free” with Comparable. If you do decide to implement both, however (which can be a reasonable choice for reasons of efficiency) you should ensure that (x <=> y) == 0) implies that x == y and vice-versa.

Oddities

Most of the standard Ruby classes follow these rules. There is one exception we’ve come across though. Hash’s implementation of eql? appears to test for object identity, not equal values. Contrast its behaviour with Array which does behave as we expect:

x = {:a => 'foo'}
y = {:a => 'foo'}

x == y
=> true
x.eql? y
=> false

x = ['foo', 'bar']
y = ['foo', 'bar']

x == y
=> true
x.eql? y
=> true

If anyone can cast any light on the reason for this discrepancy, we’re all ears!