|
| 1 | +### Hash Functions |
| 2 | + |
| 3 | +Using a hash function, we can perform lookups in constant time. This contrasts with the use of lists and sets, which take linear time for lookups. |
| 4 | + |
| 5 | +The way it works is that we have a value that we pass through some function to get a hash value, and use the hash value as a location in the array we store to. That way we can immediately know where a value is stored since we know the hash value. |
| 6 | + |
| 7 | +For example, if we had numbers as values, we could take the remainder of that number when divided by another number, and use the remainder as the location in an array. |
| 8 | + |
| 9 | +Ex: |
| 10 | + |
| 11 | +4979 as our number, and 109 as the hash function division. 4979 % 109 is 74, so we store that number in our array[74] position. |
| 12 | + |
| 13 | +### Collisions |
| 14 | + |
| 15 | +What if we have two numbers that end up at the same position when we run them through our hash function? |
| 16 | + |
| 17 | +1. We can change our hash and make it bigger, so that each colliding value gets its own location in the array. |
| 18 | + |
| 19 | +The downsides with the above approach are that this increases the space complexity, and if this is being performed reactively, then changing the hash function, recalculating, and copying the old values into a new array also increases some time complexity. |
| 20 | + |
| 21 | +2. Instead of storing one value in each array location, we can make a "bucket" of values by creating a list in each array location. |
| 22 | + |
| 23 | +The downsides with this approach are that searching through each bucket now takes linear time complexity according to the size of each bucket. In the worst case, if all values end up stored in the same bucket, we now have O(n) time like we did with a list. |
| 24 | + |
| 25 | +Other methods include perhaps making a second hash function in each bucket to further divide up the elements. |
| 26 | + |
| 27 | +### Load Factor |
| 28 | + |
| 29 | +Load factor is the # of entries divided by the # of buckets. |
| 30 | + |
| 31 | +### Quiz Answer: |
| 32 | + |
| 33 | +Coworker has a hash function that divides a group of values by 100 and uses remainder as key. Values are 100 numbers, all divisible by 5. |
| 34 | + |
| 35 | +What is the load factor? |
| 36 | + |
| 37 | +It is 1, because there are 100 numbers, and the hash function has 100 unique spots. |
| 38 | + |
| 39 | +What number would you recommend his function to divide by to speed it up? |
| 40 | + |
| 41 | +87 |
| 42 | +107 <---- |
| 43 | +125 |
| 44 | +1001 |
| 45 | + |
| 46 | +87 creates collisions. |
| 47 | +125 is divisible by 5 and hence also creates collisions. |
| 48 | +1001 is wasted space. |
0 commit comments