maps and hashing begun

linkel · linkel · commit 692536218809 · 2018-12-26T13:18:48.000-08:00
diff --git a/3-search-sort/QuickSort-mine.py b/3-search-sort/QuickSort-mine.py
@@ -5,11 +5,15 @@ def quicksort(array):
     if len(array) < 2:
         return array
     stack_of_index = []
+    # index stores beginning and the end boundaries
     index = (0, len(array) - 1)
     stack_of_index.append(index)
     for index in stack_of_index:
+        # first value is the element to compare
         e_index = index[0]
+        # second value is the pivot
         pivot_index = index[1]
+        # until they cross paths, keep executing the following:
         while pivot_index > e_index:
             pivot = array[pivot_index]
             e = array[e_index]
@@ -18,7 +22,7 @@ def quicksort(array):
                 array[e_index] = array[pivot_index - 1]
                 array[pivot_index - 1] = pivot
                 pivot_index -= 1
-            else:
+            else: # it's in the correct side of the pivot, so move on
                 e_index += 1
         low = index[0]
         high = index[1]
diff --git a/4-maps-hashing/DictQuiz-mine.py b/4-maps-hashing/DictQuiz-mine.py
@@ -0,0 +1,47 @@
+"""Time to play with Python dictionaries!
+You're going to work on a dictionary that
+stores cities by country and continent.
+One is done for you - the city of Mountain 
+View is in the USA, which is in North America.
+
+You need to add the cities listed below by
+modifying the structure.
+Then, you should print out the values specified
+by looking them up in the structure.
+
+Cities to add:
+Bangalore (India, Asia)
+Atlanta (USA, North America)
+Cairo (Egypt, Africa)
+Shanghai (China, Asia)"""
+
+locations = {
+    'North America': {'USA': ['Mountain View', 'Atlanta']},
+    'Asia': {'India': ['Bangalore'], 'China': ['Shanghai']},
+    'Africa': {'Egypt': ['Cairo']},
+    
+}
+
+"""Print the following (using "print").
+1. A list of all cities in the USA in
+alphabetic order.
+2. All cities in Asia, in alphabetic
+order, next to the name of the country.
+In your output, label each answer with a number
+so it looks like this:
+1
+American City
+American City
+2
+Asian City - Country
+Asian City - Country"""
+
+print('1')
+array = sorted(locations['North America']['USA'])
+for i in array:
+    print(i)
+print('2')
+asian = locations['Asia'].items()
+asian = sorted(asian, key=lambda item:item[1])
+for i in asian:
+    print(i[1][0] + " - " + i[0])
diff --git a/4-maps-hashing/DictQuiz.py b/4-maps-hashing/DictQuiz.py
@@ -0,0 +1,19 @@
+locations = {'North America': {'USA': ['Mountain View']}}
+locations['North America']['USA'].append('Atlanta')
+locations['Asia'] = {'India': ['Bangalore']}
+locations['Asia']['China'] = ['Shanghai']
+locations['Africa'] = {'Egypt': ['Cairo']}
+
+print 1
+usa_sorted = sorted(locations['North America']['USA'])
+for city in usa_sorted:
+    print city
+
+print 2
+asia_cities = []
+for countries, cities in locations['Asia'].iteritems():
+    city_country = cities[0] + " - " + countries 
+    asia_cities.append(city_country)
+asia_sorted = sorted(asia_cities)
+for city in asia_sorted:
+    print city
diff --git a/4-maps-hashing/Hashing-notes.md b/4-maps-hashing/Hashing-notes.md
@@ -0,0 +1,48 @@
+### Hash Functions
+
+Using a hash function, we can perform lookups in constant time. This contrasts with the use of lists and sets, which take linear time for lookups. 
+
+The way it works is that we have a value that we pass through some function to get a hash value, and use the hash value as a location in the array we store to. That way we can immediately know where a value is stored since we know the hash value. 
+
+For example, if we had numbers as values, we could take the remainder of that number when divided by another number, and use the remainder as the location in an array. 
+
+Ex:
+
+4979 as our number, and 109 as the hash function division. 4979 % 109 is 74, so we store that number in our array[74] position. 
+
+### Collisions
+
+What if we have two numbers that end up at the same position when we run them through our hash function? 
+
+1. We can change our hash and make it bigger, so that each colliding value gets its own location in the array.
+
+The downsides with the above approach are that this increases the space complexity, and if this is being performed reactively, then changing the hash function, recalculating, and copying the old values into a new array also increases some time complexity.
+
+2. Instead of storing one value in each array location, we can make a "bucket" of values by creating a list in each array location. 
+
+The downsides with this approach are that searching through each bucket now takes linear time complexity according to the size of each bucket. In the worst case, if all values end up stored in the same bucket, we now have O(n) time like we did with a list. 
+
+Other methods include perhaps making a second hash function in each bucket to further divide up the elements. 
+
+### Load Factor
+
+Load factor is the # of entries divided by the # of buckets. 
+
+### Quiz Answer:
+
+Coworker has a hash function that divides a group of values by 100 and uses remainder as key. Values are 100 numbers, all divisible by 5. 
+
+What is the load factor? 
+
+It is 1, because there are 100 numbers, and the hash function has 100 unique spots. 
+
+What number would you recommend his function to divide by to speed it up?
+
+87
+107 <---- 
+125
+1001
+
+87 creates collisions. 
+125 is divisible by 5 and hence also creates collisions. 
+1001 is wasted space.