Skip to content

Commit c09b83a

Browse files
committed
FEAT: add linear probing notes and demo notebook
1 parent bc2ba1f commit c09b83a

File tree

5 files changed

+176
-0
lines changed

5 files changed

+176
-0
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,4 @@ Repo for general concepts for working with python (e.g. regex, files operations,
2323
15. [gRPC](https://github.com/MKaczkow/python_concepts/tree/master/grpc_example)
2424
16. [PageRank](https://github.com/MKaczkow/python_concepts/tree/master/page_rank)
2525
17. [Graphs](https://github.com/MKaczkow/python_concepts/tree/master/graphs)
26+
18. [Linear probing (hash collisions resolution)](./linear_probing)

linear_probing/DEMO.ipynb

+109
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": null,
6+
"metadata": {},
7+
"outputs": [],
8+
"source": [
9+
"from hash_table import HashTable"
10+
]
11+
},
12+
{
13+
"cell_type": "code",
14+
"execution_count": null,
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"ht = HashTable(10)"
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": null,
24+
"metadata": {},
25+
"outputs": [],
26+
"source": [
27+
"print(ht)"
28+
]
29+
},
30+
{
31+
"cell_type": "code",
32+
"execution_count": null,
33+
"metadata": {},
34+
"outputs": [],
35+
"source": [
36+
"ht.insert(\"apple\", 1)\n",
37+
"ht.insert(\"banana\", 2)\n",
38+
"ht.insert(\"cherry\", 3)"
39+
]
40+
},
41+
{
42+
"cell_type": "code",
43+
"execution_count": null,
44+
"metadata": {},
45+
"outputs": [],
46+
"source": [
47+
"print(ht)"
48+
]
49+
},
50+
{
51+
"cell_type": "code",
52+
"execution_count": null,
53+
"metadata": {},
54+
"outputs": [],
55+
"source": [
56+
"ht.insert(\"kiwi\", 1)"
57+
]
58+
},
59+
{
60+
"cell_type": "code",
61+
"execution_count": null,
62+
"metadata": {},
63+
"outputs": [],
64+
"source": [
65+
"print(ht)"
66+
]
67+
},
68+
{
69+
"cell_type": "code",
70+
"execution_count": null,
71+
"metadata": {},
72+
"outputs": [],
73+
"source": [
74+
"print(ht.get(\"apple\"))\n",
75+
"print(ht.get(\"banana\"))\n",
76+
"print(ht.get(\"cherry\"))\n",
77+
"print(ht.get(\"kiwi\"))"
78+
]
79+
},
80+
{
81+
"cell_type": "code",
82+
"execution_count": null,
83+
"metadata": {},
84+
"outputs": [],
85+
"source": []
86+
}
87+
],
88+
"metadata": {
89+
"kernelspec": {
90+
"display_name": "venv",
91+
"language": "python",
92+
"name": "python3"
93+
},
94+
"language_info": {
95+
"codemirror_mode": {
96+
"name": "ipython",
97+
"version": 3
98+
},
99+
"file_extension": ".py",
100+
"mimetype": "text/x-python",
101+
"name": "python",
102+
"nbconvert_exporter": "python",
103+
"pygments_lexer": "ipython3",
104+
"version": "3.11.0"
105+
}
106+
},
107+
"nbformat": 4,
108+
"nbformat_minor": 2
109+
}

linear_probing/README.md

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Linear probing
2+
*... is a way of resolving hash collisions*
3+
4+
## General
5+
* there are different strategies for resolving hash collisions, linear probing is one of them
6+
* still pretty good, despite being created in 1954
7+
8+
### Hashing strategies
9+
* `closed addressing` -> all elements with hash collisions are stored in some secondary data structure
10+
```python
11+
if key in hash_map:
12+
hash_map[key].append(value)
13+
else:
14+
hash_map[key] = [value]
15+
```
16+
* `perfect hashing` -> no collisions (e.g. `cuckoo hashing`)
17+
* `open addressing` -> elements may *leak out*
18+
19+
### Linear probing (open addressing)
20+
* `on insert`
21+
* compute h(x)
22+
* try to insert x at h(x)
23+
* if h(x) is occupied, try h(x+1), h(x+2), ...
24+
* `on delete`
25+
* use `tombstones` to mark cell as *empty, but previously occupied*
26+
* on lookup, we don't stop at tombstones
27+
28+
### Primary clustering
29+
* main problem, causing linear probing performance to degrade
30+
31+
## References
32+
* [stanford lecture](https://web.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/12/Small12.pdf)

linear_probing/__init__.py

Whitespace-only changes.

linear_probing/hash_table.py

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
class HashTable:
2+
def __init__(self, size):
3+
self.size = size
4+
self.table = [None] * size
5+
6+
def hash_function(self, key):
7+
# A vary simple hash function (modulo)
8+
return hash(key) % self.size
9+
10+
def insert(self, key, value):
11+
index = self.hash_function(key)
12+
13+
# Linear probing for collision resolution
14+
while self.table[index] is not None:
15+
index = (index + 1) % self.size
16+
17+
self.table[index] = (key, value)
18+
19+
def get(self, key):
20+
index = self.hash_function(key)
21+
22+
original_index = index # Keep track of where we started (prevent infinite loop)
23+
while self.table[index] is not None:
24+
if self.table[index][0] == key:
25+
return self.table[index][1]
26+
index = (index + 1) % self.size
27+
28+
if index == original_index: # Means we've looped through the whole table
29+
return None
30+
31+
return None
32+
33+
def __str__(self):
34+
return str(self.table)

0 commit comments

Comments
 (0)