A typical hash function first converts a search key to an integer value called a hash code, then compresses the hash code into an index to the hash table. Java’s root class **Object** has the **hashCode** method, which returns an integer hash code. By default, the method returns the memory address for the object. The general contract for the **hashCode** method is as follows:

- You should override the
**hashCode**method whenever the**equals**method is overridden to ensure that two equal objects return the same hash code. - During the execution of a program, invoking the
**hashCode**method multiple times returns the same integer, provided that the object’s data are not changed. - Two unequal objects may have the same hash code, but you should implement the
**hashCode**method to avoid too many such cases.

## Hash Codes for Primitive Types

For search keys of the type **byte**, **short**, **int**, and **char**, simply cast them to **int**. Therefore, two different search keys of any one of these types will have different hash codes.

For a search key of the type **float**, use **Float.floatToIntBits(key)** as the hash code. Note that **floatToIntBits(float f)** returns an **int** value whose bit representation is the same as the bit representation for the floating number **f**. Thus, two different search keys of the **float** type will have different hash codes.

For a search key of the type **long**, simply casting it to **int** would not be a good choice, because all keys that differ in only the first 32 bits will have the same hash code. To take the first 32 bits into consideration, divide the 64 bits into two halves and perform the exclusive-or operation to combine the two halves. This process is called *folding*. The hash code for a **long** key is

`int hashCode = (int)(key ^ (key >> 32));`

Note that **>>** is the right-shift operator that shifts the bits 32 positions to the right. For example, **1010110 >> 2** yields **0010101**. The ^ is the bitwise exclusive-or operator. It operates on two corresponding bits of the binary operands. For example, **1010110 ^ 0110111** yields **1100001**.

For a search key of the type **double**, first convert it to a **long** value using the **Double.doubleToLongBits** method, and then perform a folding as follows:

`long bits = Double.doubleToLongBits(key);`

int hashCode = (int)(bits ^ (bits >> 32));

## Hash Codes for Strings

Search keys are often strings, so it is important to design a good hash function for strings. An intuitive approach is to sum the Unicode of all characters as the hash code for the string. This approach may work if two search keys in an application don’t contain the same letters, but it will produce a lot of collisions if the search keys contain the same letters, such as **tod** and **dot**.

A better approach is to generate a hash code that takes the position of characters into consideration. Specifically, let the hash code be

s0*b(n - 1) + s1*b(n - 2) + c + sn-1

where si is **s.charAt(i)**. This expression is a polynomial for some positive b, so this is called a *polynomial hash code*. Using Horner’s rule for polynomial evaluation (see Case Study: Converting Hexadecimals to Decimals), the hash code can be calculated efficiently as follows:

(...((s0*b + s1)b + s2)b + ... + sn-2)b + sn-1

This computation can cause an overflow for long strings, but arithmetic overflow is ignored in Java. You should choose an appropriate value b to minimize collisions. Experiments show that good choices for b are 31, 33, 37, 39, and 41. In the **String** class, the **hashCode** is overridden using the polynomial hash code with b being **31**.

## Compressing Hash Codes

The hash code for a key can be a large integer that is out of the range for the hash-table index, so you need to scale it down to fit in the index’s range. Assume the index for a hash table is between **0** and **N-1**. The most common way to scale an integer to between **0** and **N-1** is to use

`h(hashCode) = hashCode % N`

To ensure that the indices are spread evenly, choose **N** to be a prime number greater than **2**.

Ideally, you should choose a prime number for **N**. However, it is time consuming to find a large prime number. In the Java API implementation for **java.util.HashMap**, **N** is set to a value of the power of **2**. There is a good reason for this choice. When **N** is a value of the power of **2**,

`h(hashCode) = hashCode % N`

is the same as

**h(hashCode) = hashCode & (N – 1)**

The ampersand, **&**, is a bitwise AND operator. The AND of two corresponding bits yields a **1** if both bits are **1**. For example, assume **N = 4** and **hashCode = 11**, **11 % 4 = 3**, which is the same as **01011 & 00011 = 11**. The **&** operator can be performed much faster than the **%** operator.

To ensure that the hashing is evenly distributed, a supplemental hash function is also used along with the primary hash function in the implementation of **java.util.HashMap**. This function is defined as:

`private static int supplementalHash(int h) {`

h ^= (h >>> 20) ^ (h >>> 12);

return h ^ (h >>> 7) ^ (h >>> 4);

}

**^** and **>>>** are bitwise exclusive-or and unsigned right-shift operations. The bitwise operations are much faster than the multiplication, division, and remainder operations. You should replace these operations with the bitwise operations whenever possible.

The complete hash function is defined as:

`h(hashCode) = supplementalHash(hashCode) % N`

This is the same as

`h(hashCode) = supplementalHash(hashCode) & (N – 1)`

since **N** is a value of the power of **2**.

## Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.