Hash tables.


Introduction to hash tables:-

Hash tables store the data in the key-value pair format. Above all, every programming language has a hash table such as Dictionaries in python, Map interface in java, Object class in javascript, etc. A key-value pair is represented as key : value.

If you have worked with JSON or Objects in any programming languages you are very much familiar with the representation. In the key-value pair, a key is used to store and retrieve the value. A key should always be a string. A value can be anything such as an object, string, numbers, etc.

How do hash tables work?

Above all, we generally use arrays/linked lists to implement the hash tables. But they work based on the index of the element. If we want to get a value from array, we will get it based on the index of the element. whereas if using a linked list, we will get the position of the element from the user and traverse the list till that position and return the value.

Hash tables use the keys to store, retrieve the data. This key is passed through a special function called the hash function. Hash function returns the position at which value needs to be stored. While retrieving the value from the hash table, The same key is passed through the same hash function which will give us the position at which the value is stored. As a result, the value stored at that position is fetched.

Since we are passing the keys through a hash function to return the position to store and fetch, it has to be uniformly consistent. What I meant by uniformly consistent is, the hash function should give the same position always by taking the same key irrespective of the number of times of usage. This is an important property that every hash function should follow. For instance, observe the following image.

hash table example.
 Hash table example
 
If the has function gives the value 1 while storing a key-value pair banana, the banana will be stored at position 1. But, if the hash function is not uniformly consistent and returns 0 while retrieving, apple will be returned which is wrong. In conclusion, the hash function has to be uniform.
However,every hash function should follow some properties as shown below.
 

 properties of good hash function.

Implementing hash function:-

Firstly, let’s implement a hash function following our above stated guidelines.
Since we are using string type keys lets follow the following algorithm.
  1. Iterate over the string and fetch the character at the index.
  2. get the ASCII code. Add it to the global total value. The result percentile by the length of the array used in the implementation.
  3. In the end, return the total value.
public int hash(String key, int arrayLength) {
    //arrayLength varibale holds the length of the array  used for hash table implementation.
    int total = 0;
    for (int i = 0; i < key.length(); i++) {
        int charCode = key.charAt(i) - 96;
        total = (total + charCode) % arrayLength;
    }
    return total;
}
function hash(key, arrayLength) { 
//arrayLength varibale holds the value of length 
  of the array used in the implementation..
    let total = 0;
    for (let char of key) {
        let value = char.charCodeAt(0) - 96;
        total = (total + value) % arrayLength;
        console.log()
    }
    return total;
}

What’s wrong with the above hash function?

As we discussed in the above sections hash function has to be 

  • Constant in time.
  • Uniformly distributed.
  • Deterministic.
But if you observe the code sample above, The code is 
  • Linearly growing in time because of the for loop.
  • Only hashes the strings. What about numbers and any other types.
  • Might give the same index for multiple keys so not uniformly consistent.

What is the solution?

The solution for a better hash function is by following the below rules.

  • Repeating the for loop for the minimum value of string length or 100.
  • Adding a prime number in the calculation of position for the element.
Hense the updated hash function looks like
public int hashWithPrimeNumber(String key, int arrayLength) {
    //arrayLength varibale holds the length of the array  used for hash table implementation.
    int total = 0;
    int PRIME_NUMBER = 3;
    for (int i = 0; i < Math.min(key.length(), 100); i++) {
        int charCode = key.charAt(i) - 96;
        total = (total * PRIME_NUMBER + charCode) % arrayLength;
    }
    return total;
}
function hashWithPrimeNumber(key, arrayLength) { 
//arrayLength varibale holds the value of length 
of the array.
    let total = 0;
    let PRIME_NUMBER = 3;
    for (var i = 0; i < Math.min(key.length, 100); i++) {
        let char = key[i];
        let value = char.charCodeAt(0) - 96;
        total = (total * PRIME_NUMBER + value) % arrayLength;
    }
    return total;
}

Even though we improved the performance of the hash function, there is a bigger problem. The problem is how are we going to handle the collisions?

What are collisions?

So far we have seen that the hash table is a key-value pair. We use an array or a list to implement it. While inserting the new data into the hash table we pass that data through a hash function which gives the position at which it needs to be stored. Assume that we have stored this data here. Subsequently, we are trying to insert some more data. What if the hash function returns the same position as previous data? 
This is known as collision because we hash table is trying to store the multiple data at an index.

How to handle collisions?

There are many algorithms to handle collisions. But 2 of them are very famous. They are

  1. Separate chaining.
  2. Linear probing.
Let’s discuss both of them.
 

1. Separate chaining:-

Using the Separate chaining method, at each index of the array we use another data structure such as another array or linked list, etc to store the collision data.
separate chaining.
 

2. Linear probing:-

In linear probing approach, if we get the position of an array that already has the data init then we will look for another empty spot in the array to store the data. The same procedure is repeated irrespective of how many numbers of times the position collides. 
 
Now that the hash function is ready and we know how to handle the collisions, let’s implement some functions such as set, get, etc.
 
Base classes for hash table implementation are as shown below.
public class HashTableWithList {
    Object[] keyMap;
    public HashTableWithList(int listSize) {
        this.keyMap = new Object[listSize];
    }
    public int hashWithPrimeNumber(String key) {
        int total = 0;
        int PRIME_NUMBER = 3;
        for (int i = 0; i < Math.min(key.length(), 100); i++) {
            int charCode = key.charAt(i) - 96;
            total = (total * PRIME_NUMBER + charCode) % this.keyMap.length;
        }
        return total;
    }
}
class HashTable {
    constructor(size = 101) {
        this.keyMap = new Array(size);
    }
    hashWithPrimeNumber(key) {
        let total = 0;
        let PRIME_NUMBER = 3;
        for (var i = 0; i < Math.min(key.length, 100); i++) {
            let char = key[i];
            let value = char.charCodeAt(0) - 96;
            total = (total * PRIME_NUMBER + value) % this.keyMap.length;
        }
        return total;
    }
}

1. Set/Get:-

As the name suggests, this operation is used for setting and getting the values from the hash table. The pseudocode for the same is as shown below.
Hash table set get pseudocode.
Hash table set get pseudocode.
 
The implementations are as follows.

    public void set(String key, Object value) {
        int index = this.hashWithPrimeNumber(key);
        if (this.keyMap[index] == null) {
            this.keyMap[index] = new ArrayList<Object[]>();
        }
        ArrayList < Object[] > indexData =(ArrayList<Object[]>) this.keyMap[index];
        Object[] o = { key, value };
        indexData.add(o);
        this.keyMap[index] = indexData;
    }
    
    public Object get(String key) {
        int index = this.hashWithPrimeNumber(key);
        if (this.keyMap[index] != null) {
            int indexedDataSize = ((ArrayList<Object[]>)this.keyMap[index]).size();
            for (int i = 0; i < indexedDataSize; i++) {
                Object[] data = ((ArrayList<Object[]>)this.keyMap[index]).get(i);
                if (key.equals(data[0])) {
                    return data[1];
                }
            }
        }
        return null;
    }
    set(key, value) {
        let index = this.hashWithPrimeNumber(key);
        if (!this.keyMap[index]) {
            this.keyMap[index] = [];
        }
        this.keyMap[index].push([key, value]);
        console.log("index is:-", index);
    }
    get(key) {
        let index = this.hashWithPrimeNumber(key);
        if (this.keyMap[index]) {
            for (var i = 0; i < this.keyMap[index].length; i++) {
                if (this.keyMap[index][i][0] === key) {
                    return this.keyMap[index][i][1];
                }
            }
        }
        return null;
    }

2. Keys/Values:-

  • Keys function will return an array of all the keys in the hash table.
  • Values function will return an array of all the values available in the hash table.
The pseudocode is very simple as shown below.
Hash table keys and values pseudocode..
Hash table keys and values pseudocode.
   public ArrayList<Object> keys() {
        ArrayList < Object > keysArr = new ArrayList<>();
        for (int i = 0; i < this.keyMap.length; i++) {
            if (this.keyMap[i] != null) {
                int indexedDataSize = ((ArrayList<Object[]>)this.keyMap[i]).size();
                for (int j = 0; j < indexedDataSize; j++) {
                    Object[] data = ((ArrayList<Object[]>)this.keyMap[i]).get(j);
                    keysArr.add(data[0]);
                }
            }
        }
        return keysArr;
    }
    public ArrayList<Object> values() {
        ArrayList < Object > valuesArr = new ArrayList<>();
        for (int i = 0; i < this.keyMap.length; i++) {
            if (this.keyMap[i] != null) {
                int indexedDataSize = ((ArrayList<Object[]>)this.keyMap[i]).size();
                for (int j = 0; j < indexedDataSize; j++) {
                    Object[] data = ((ArrayList<Object[]>)this.keyMap[i]).get(j);
                    valuesArr.add(data[1]);
                }
            }
        }
        return valuesArr;
    }
    keys() {
        let keysArr = []
        for (let i = 0; i < this.keyMap.length; i++) {
            if (this.keyMap[i]) {
                for (var j = 0; j < this.keyMap[i].length; j++) {
                    keysArr.push(this.keyMap[i][j][0])
                }
            }
        }
        return keysArr;
    }
    values() {
        let valuesArr = []
        for (let i = 0; i < this.keyMap.length; i++) {
            if (this.keyMap[i]) {
                for (var j = 0; j < this.keyMap[i].length; j++) {
                    valuesArr.push(this.keyMap[i][j][1])
                }
            }
        }
        return valuesArr;
    }