Tuesday, September 1, 2015

Hash Table Simplified



A very basic explanation of a Hash Table Data Structure

Let us consider the problem of storing a large number of words so that insertion and search operations can be performed in O (1) time complexity

If the words are stored in a simple array, then the operation will have time complexity of O (n) where n is the number of words.

We can solve the above problem by using a Hash Table data structure
What is a hash table?
A hash table is a set of key-value pairs where key is unique and a particular key cab hold a certain value.
A key is generated by a hash function.

For our example, let us consider that the hashing function works in the following manner:
1. The letters of the word are assigned a particular integer value. For example: a -> 1, b -> 2, c -> 3 and so on.
2. The integers corresponding to the letters present in a particular word are added.
3. This sum is taken as a key
4. Let us assume the number of keys formed in this way is k (which is constant). There is a catch* here which is discussed later. For now assume that the maximum number of possible keys is a constant k.
5. Let us also assume that the number of words corresponding to a single key is constant which equals to w on an average.

Now let us consider the following words and the corresponding sums which will be the keys.
Word
Sum which will be used as key
abc
6
ae
6
abcd
10
ade
10
  
Sample Hash Table storage:


Now to search for a word ade:
1. First the sum is determined which is O (1). Please note that the complexity of finding the sum for a given word having m letters is negligible compared to the complexity of finding a word in an array containing n such words.
2. Now this key can be searched in the key linked list in O(k) which in Big O notation boils down to O(1).
3. Once the key is determined, the word can be searched in O(w) which in Big O notation again boils down to O(1).
So the Search operation can be performed on such a data structure with time complexity of O (n).

Similarly for inserting a word acf:
1. First the sum is determined which is O (1). Please note that the complexity of finding the sum for a given word having m letters is negligible compared to the complexity of finding a word in an array containing n such words.
2. Now this key can be searched in the key linked list in O(k) which in Big O notation boils down to O(1).
3. Once the key is determined, the word can be inserted in O(w) which in Big O notation again boils down to O(1).
So the Insertion operation can be performed on such a data structure with time complexity of O (n).

*Catch: The choice of the hashing function needs to be made very carefully so as to maintain the following two criteria:
1. The number of unique keys generated by the hashing function should not become too high.
2. The number of words or entities needed to be stored in one linked list should not become too large.

Note: This is an attempt to explain the hash table data structure in a very simple terms. Actual implementations are much more complex and use details of the available memory storage.

References:
https://en.wikipedia.org/wiki/Hash_table

2 comments: