Monthly Archives: February 2014

Simple Tutorial on Java equals and hashCode

For a long time, I wanted to write this up. I looked around for an easy way to understand Java equals and hashCode methods, but found that while the facts are stated in many places, a simple example is missing. So here is a humble attempt.

You probably know this already, but for the sake of completeness, here is the question:
How do you check if two variables in Java are equal?
Before we can answer that question, we should ask what the different types of variables in Java are. For the purposes of testing equality, there are two: primitives and objects. Primitives can be checked for equality with ==, so your code will look like:

(Strictly speaking, you would not be comparing floats and doubles this way, even though they are primitives, since they do not store exact values.)

How do we compare objects? Let’s say we have this Book class:

And we have a test drive class like this:

When we want to compare object references, we will use == and when we want to check for logical equality, we will use equals. So we would expect(b1 == b2) to be false (since b1 and b2 are different objects) and
b1.equals(b2) to be true (since they both are in fact the same book).

But what does BooksDemo print when run?
b1 == b2 ? false
b1.equals(b2) ? false

Why does b1.equals(b2) return false? Are we not doing value comparison? As it turns out, we are still doing reference comparison! Since we did not override the equals method in Book class, Java simply called the super-class’s equals method. What is the super class of Book? Book extends Object, which is the parent of all Java classes. The Object class equals method is simply this:

Ok, let us add equals method to Book class. If the authors and the titles are the same for two Books, we will say they are logically equal.

Now if we run BooksDemo it prints:
b1 == b2 ? false
b1.equals(b2) ? true

Great!

While this works, there are some issues. The subtle mistake in our equals method above is that it does not really override Object class equals method, since the signature for Object class equals method is

but Book has

What we did was an overload and not an override. This is one of the reasons to always use @Override annotation for overrided methods to make sure we are indeed overriding, and not overloading. That annotation would have pointed out an error.

In practice, our incorrect overloaded method may never cause any problem at all. We may always be invoking its equals method only with Book objects. But if someone compares another object like String to our Book object, then the equals method of Object class will be invoked and not our overloaded equals in Book class.

So let us fix this:

We have a correct equals method now. But we also need to fix another issue:
Two equal objects must return the same hashCode.

So whenever we over-ride equals, we must override hashCode and make sure that equal objects return the same hashCode.

If we do not over-ride hashCode method in our class, then we will use Object class hashCode method. As mentioned here, Object hashCode is typically implemented by converting the internal address of the object into an integer. Internal address of two different objects, even if they are equal (i.e. b1.equals(b2) is true), may be different and we may get different hashCode, which violates this requirement.

Not over-riding hashCode creates problems for HashSet, HashTable and HashMap. Let’s see an example. Recall that Sets are useful for keeping unique elements, so we would expect logically equal elements to not be duplicated. Also recall that when we invoke the add method on a set, it will return false if the item is a duplicate.

This code prints:

Item added ? true
Item added ? true
# of unique books = 2

So why did the second item get added, though it is equal to the first? Isn’t Set supposed to prevent duplicates?

Let us put this hashCode method in Book class now and see what is happening:

Now run BooksDemo again and we see:

in hashcode. I am simply returning Object hashCode=18426253
Item added ? true
in hashcode. I am simply returning Object hashCode=16197143
Item added ? true
# of unique books = 2

So we see that add first gets the hashCode of the object we are trying to add. If there is no item in the set with the same hashCode, it will decide that the new item is not a duplicate. (Recall again, ‘equal implies same hashCode’ which means ‘unequal hashCode implies unequal objects’.) If there is another object in the set with the same hashCode, then it will invoke equals method to check if there is an item in the set equal to the new item we are trying to add. In our case above, we reused Object’s hashCode, so our new Book’s hashCode is different from the one in the set.

Let us make sure we write a correct hashCode method now:

This method returns the same hashCode for equal objects. For unequal objects, it will return different hashCodes almost all the time (unless there are hash collisions between the author and title strings, which will be very rare).

Let us add a print statement to our equals method also for verification:

Now let’s run BooksDemo and we get the following output:

in hashcode
hashCode=1869578864
Item added ? true
in hashcode
hashCode=1869578864
in equals
Item added ? false
# of unique books = 1

The first item gets added, but when we add the second item, the hashCode of the new object is the same as the hashCode of the item already in the set, so equals is invoked. Since the two objects are equal, add decides this is a duplicate object and does not add it to the set, so we get the expected behavior.

Another important note about hashCode is this: unequal objects need not have different hashCodes. However as noted here, the programmer should be aware that producing distinct hashCodes for unequal objects may improve the performance of hash tables. This has to do with the bucket sizes of hashes and as much as possible we should avoid putting more than one object in a bucket. In general, hash collisions should be rare so that performance of lookups is fast.