BEGIN TYPING YOUR SEARCH ABOVE AND PRESS RETURN TO SEARCH. PRESS ESC TO CANCEL

Implement Your Own HashMap

I‘ve come across this problem during a few interviews in the past. If you’re familiar with HashMaps, then it shouldn’t really be a problem. However, some people use HashMaps without knowing how they work. I guess this question is a test to see how well you understand how common data structures operate in computer science.

HashMaps, WTF Are They?

We can start out thinking of a HashMap as a famous book called the Dictionary. A dictionary has:

  • A collection of:
    • Words – These words are what we use to search the dictionary with. Words always have a definition.
    • Words can not be duplicated.
    • Definitions – Each word has a definition of what that word means.
    • Definitions can be duplicated.
  • A way to search for words (using the dictionary’s index)

OK, now if we convert this depiction of a dictionary to a HashMap, we’ll find it is very similar, except the HashMap uses different terminology. Now a HashMap contains:

  • A collection of (in our case, it’s an array)
    • Keys – Similar to words in a dictionary
    • Keys can not be duplicated.
    • Values – Similar to the definitions in a dictionary
    • Values can be duplicated.
  • A way to search for keys (using a hash function that will generate an index in our array).

Sounds too easy, right? Let’s go back over what we have…

  • A HashMap class that contains
    • An array
    • A key
    • A value (the key and value pairs will be added to an object for storage)
    • A hashing function that generates an index that between 0 and the size of our array.

The problem we now face is that the array will be a fixed size, and eventually the hashing function will spue out an index that is already in use. This is called a collision. What happens then?

There are a number of ways to deal with this, but in my case, I create a linked list at every index of the array, which allows me to fit as many elements as I need into the array (within reason of course, if the array is too small then the get and put operations could take a long time).

For every put operation we create a Node object, which contains the attributes:

  • String key;
  • String value;
  • Node next; //reference to the next node in the linked list

What we end up with is a HashMap which stores Key-Value pairs in a Node/Linked List, which is stored in an array. The running time of this is O(1) if the hashing function is good, whereas if the hashing function is bad (which will lead to many collisions) the running time could be O(N).

The resulting array can be visualised like this:

maptable

Here is the code I’ve written, along with some tests. You can also view it as a Gist on Github.

I’ve written a couple of small unit tests to examine it….

There you have it, a simple implementation of a HashMap. To add to this, we could create a function to double the size of the array if the running time exceeds a certain limit.

Discuss on HackerNewsReddit

2 Comments

  1. LinkedList is not a good solution for collision. But it is easy to write for coding interview.
    Also your HashMaps size do not change dynamically. Which is something can be improved.
    Re-hash is IMHO a better solution but hard to write especially when there is delete. And must have need dynamically resize.

    • Cheers Gordon. Yes, in Java 1.8 if the threshold of a HashMap is breached it will use a binary tree instead of a linked list (making the worst case O(log(n))). I was too lazy to implement it

Leave a comment

Please be polite. We appreciate that. Your email address will not be published and required fields are marked