A dissent from the Java "contract" . . .

Some Objects Shouldn't Be Hashed!

© Conrad Weisert, Information Disciplines, Inc., June 1 2009

NOTE: This document may be circulated or quoted from freely, as long as the copyright credit is included.
.

Rules and contracts

This article expands a brief point in a book review that has elicited strong dissent.

The Java community keeps reminding class designers of things they have to do to honor various conventions. One of them is:

"Always override hashCode when you override equals."

- various sources including Joshua Bloch: Effective Java
(item #9 in the second edition).

They warn us that if we fail to do so we will have violated or broken a "contract". That sounds dire. We certainly don't want to be accused of breaking solemn contracts.

But after a moment's reflection, we wonder what the purpose of that contract is. What kinds of things might we want to compare for equality? What kinds of things might we want to retrieve by value from a table? Are those two categories of things the same?

Key fields

Hash tables are used to retrieve items by the value of some key field. Key fields typically include:

We rarely if ever use the values of numeric data items as key fields. We'd be surprised to find a program trying to retrieve from a hash table:
  • the product that costs exactly $2.89.
  • the hospital patient whose temperature is exactly 101.4°F.
  • the flight that took off at exactly 11:33 this morning.

Clearly, it would be bizarre (and probably unreliable) to use Money objects, Temperature objects, or TimeOfDay objects as keys for retrieval. Therefore, there is very little reason to implement the hashCode() method for those and similar classes, much less to insist upon doing so as a standard contract.

It's not just numeric classes that shouldn't implement hashCode(). Many composite item classes fall in the same category. We don't retrieve a Person object by its value, but by either it's identifier or its name. We don't retrieve a Book item by its value, but by it's ISBN or its title. It's the identifiers and the names that need a hash code, not the whole objects.

Even if we can't conceive of using some object as a key field to a hash table, we may still want to compare it with another object. Therefore, we may well want to implement a Boolean equals method (overriding object.equals(object x)), or overloading it for a particular class (corresponding to C++'s overloaded == operator), or both. Therefore common sense requires violating this misguided "contract".

Solving the Problem

Another approach that may tempt us is to declare hashCode() to be private . That would still violate the contract, but would alert anyone at compile time who tries to hash an unhashable object that it won't work.

Alas, that "solution" won't work! The obstacle lies with the Liskov substitution principle combined with a poor choice by Java's designers. The universal root class Object makes hashCode() a public method, which is inherited by all other Java classes. It would have made more sense and given programmers more control to provide a Hashable interface class or just leave hashing to the user-programmer.

We shall propose a more disciplined solution in a later article.

An agreement—2013

We had to wait a while, but we finally found agreement in a 2010 book by another writer:

"If you're defining a type that won't ever be used as the key in a container, this won't matter. Types that represent window controls, Web page cotrols, or database connections are unlikely to be used as keys in a collection. In those cases do nothing."

Bill Wagner, Effective C#, ISBN 0-321-65870-1, p. 45.


Return to IDI home page

Last modified July 1, 2013