utdata
diff --git a/‎demo-cluster.qmd‎
Lines changed: 4 additions & 4 deletions b/‎demo-cluster.qmd‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/demo-cluster.html‎
Lines changed: 5 additions & 5 deletions b/‎docs/demo-cluster.html‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎docs/img/ppp-cluster-tour.png‎
71.4 KB b/‎docs/img/ppp-cluster-tour.png‎
71.4 KB
diff --git a/‎docs/img/ppp-cluster.gif‎
875 KB b/‎docs/img/ppp-cluster.gif‎
875 KB
diff --git a/‎docs/search.json‎
Lines changed: 1 addition & 1 deletion b/‎docs/search.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎img/ppp-cluster-start.png‎
69 KB b/‎img/ppp-cluster-start.png‎
69 KB
diff --git a/‎img/ppp-cluster-tour.png‎
71.4 KB b/‎img/ppp-cluster-tour.png‎
71.4 KB
diff --git a/‎img/ppp-cluster.gif‎
875 KB b/‎img/ppp-cluster.gif‎
875 KB
@@ -66,21 +66,22 @@ This helps us some, combining 300 different variations to only 162 choices, but
 OpenRefine has a concept called **Cluster** that will use algorithms to find similarly-constructed or even similar sounding words. We'll use a series of these help us clean these city names.
 
 1. In the text facet box for *City_clean*, click on the **Cluster** button at the top-right. This brings up the **Cluster and edit column** tool.
+1. Click on the **Cluster** button in the middle so we can take a little tour of the options.
 
     ![Cluster tour](img/ppp-cluster-tour.png)
 
 The idea here is to work through all the results methodically:
 
 - Look through all the values for a particular **Keying function**.
 - If you want to merge **all** the values in the cluster, check the **Merge** box and set the **New Cell Value** to the desired result.
-  - If even one of the values in the cluster does not belong together, then DON'T MERGE IT. You'll have to deal with them independently later. Take notes and edit from the text facet, perhaps.
+  - If one of the values in the cluster does belong to the new value, then uncheck the box next to that value so it won't be included.
 - Once you've reviewed all the clusters, choose **Merge Selected & Re-Cluster**.
 - After a quick double-check, change the **Keying Function** to the next algorithm.
 - Rinse and repeat for all the keying functions.
 - Then change the **Method** from "key collision" to "nearest neighbor" and follow all the above steps again.
   - With **nearest neighbor** and **levenshtein** it might be worth reducing the value in **Block Chars** to see if there are more matches that help you.
 
-Following is a gif of me going through a couple of keying functions, merges and new algorithms. I'm not fixing all the values, just showing enough of the process to give you an idea of how it works.
+Below is a gif of me going through a couple of keying functions, merges and new algorithms. I'm not fixing all the values, just showing enough of the process to give you an idea of how it works.
 
 ![Clustering](img/ppp-cluster.gif)
 
@@ -89,7 +90,6 @@ Following is a gif of me going through a couple of keying functions, merges and
 As you cluster and clean data like this, you'll likely have to do some research and make style decisions (N PROVIDENCE vs NORTH PROVIDENCE? Is it PEACE DALE or PEACEDALE?)
 
 1. Go through all the algorithms and clean up the city names.
-1. Remember: Don't merge unless all values in a cluster should be the same.
 1. Once through all the algorithms, double-check through the facet list to see if there are values the algorithms missed. It is quite possible.
 
 You would typically use text facets on all the text-based columns to check for other inconsistencies.
@@ -124,4 +124,4 @@ Once you've done all your cleaning, use the Export dropdown button at the top-ri
 
 ---
 
-We're done with this lesson. Perhaps head back to the [Overivew](index.qmd) to read about some case studies.
+We're done with this lesson. Perhaps head back to the [Overivew](index.qmd#case-studies) to read about some case studies.
@@ -290,7 +290,8 @@ <h2 class="anchored" data-anchor-id="change-to-uppercase">Change to uppercase</h
 <h2 class="anchored" data-anchor-id="cluster">Cluster</h2>
 <p>OpenRefine has a concept called <strong>Cluster</strong> that will use algorithms to find similarly-constructed or even similar sounding words. We’ll use a series of these help us clean these city names.</p>
 <ol type="1">
-<li><p>In the text facet box for <em>City_clean</em>, click on the <strong>Cluster</strong> button at the top-right. This brings up the <strong>Cluster and edit column</strong> tool.</p>
+<li><p>In the text facet box for <em>City_clean</em>, click on the <strong>Cluster</strong> button at the top-right. This brings up the <strong>Cluster and edit column</strong> tool.</p></li>
+<li><p>Click on the <strong>Cluster</strong> button in the middle so we can take a little tour of the options.</p>
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure">
 <p><img src="img/ppp-cluster-tour.png" class="img-fluid figure-img"></p>
@@ -303,7 +304,7 @@ <h2 class="anchored" data-anchor-id="cluster">Cluster</h2>
 <li>Look through all the values for a particular <strong>Keying function</strong>.</li>
 <li>If you want to merge <strong>all</strong> the values in the cluster, check the <strong>Merge</strong> box and set the <strong>New Cell Value</strong> to the desired result.
 <ul>
-<li>If even one of the values in the cluster does not belong together, then DON’T MERGE IT. You’ll have to deal with them independently later. Take notes and edit from the text facet, perhaps.</li>
+<li>If one of the values in the cluster does belong to the new value, then uncheck the box next to that value so it won’t be included.</li>
 </ul></li>
 <li>Once you’ve reviewed all the clusters, choose <strong>Merge Selected &amp; Re-Cluster</strong>.</li>
 <li>After a quick double-check, change the <strong>Keying Function</strong> to the next algorithm.</li>
@@ -313,7 +314,7 @@ <h2 class="anchored" data-anchor-id="cluster">Cluster</h2>
 <li>With <strong>nearest neighbor</strong> and <strong>levenshtein</strong> it might be worth reducing the value in <strong>Block Chars</strong> to see if there are more matches that help you.</li>
 </ul></li>
 </ul>
-<p>Following is a gif of me going through a couple of keying functions, merges and new algorithms. I’m not fixing all the values, just showing enough of the process to give you an idea of how it works.</p>
+<p>Below is a gif of me going through a couple of keying functions, merges and new algorithms. I’m not fixing all the values, just showing enough of the process to give you an idea of how it works.</p>
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure">
 <p><img src="img/ppp-cluster.gif" class="img-fluid figure-img"></p>
@@ -325,7 +326,6 @@ <h3 class="anchored" data-anchor-id="practice-cleaning-up-city_clean">Practice c
 <p>As you cluster and clean data like this, you’ll likely have to do some research and make style decisions (N PROVIDENCE vs NORTH PROVIDENCE? Is it PEACE DALE or PEACEDALE?)</p>
 <ol type="1">
 <li>Go through all the algorithms and clean up the city names.</li>
-<li>Remember: Don’t merge unless all values in a cluster should be the same.</li>
 <li>Once through all the algorithms, double-check through the facet list to see if there are values the algorithms missed. It is quite possible.</li>
 </ol>
 <p>You would typically use text facets on all the text-based columns to check for other inconsistencies.</p>
@@ -362,7 +362,7 @@ <h2 class="anchored" data-anchor-id="timeline-facets">Timeline facets</h2>
 <h2 class="anchored" data-anchor-id="export">Export</h2>
 <p>Once you’ve done all your cleaning, use the Export dropdown button at the top-right of the app to export the data in your filetype of choice.</p>
 <hr>
-<p>We’re done with this lesson. Perhaps head back to the <a href="./index.html">Overivew</a> to read about some case studies.</p>
+<p>We’re done with this lesson. Perhaps head back to the <a href="./index.html#case-studies">Overivew</a> to read about some case studies.</p>
 
 
 </section>
 
@@ -120,7 +120,7 @@
     "href": "demo-cluster.html#cluster",
     "title": "Clustering",
     "section": "Cluster",
-    "text": "Cluster\nOpenRefine has a concept called Cluster that will use algorithms to find similarly-constructed or even similar sounding words. We’ll use a series of these help us clean these city names.\n\nIn the text facet box for City_clean, click on the Cluster button at the top-right. This brings up the Cluster and edit column tool.\n\n\n\nCluster tour\n\n\n\nThe idea here is to work through all the results methodically:\n\nLook through all the values for a particular Keying function.\nIf you want to merge all the values in the cluster, check the Merge box and set the New Cell Value to the desired result.\n\nIf even one of the values in the cluster does not belong together, then DON’T MERGE IT. You’ll have to deal with them independently later. Take notes and edit from the text facet, perhaps.\n\nOnce you’ve reviewed all the clusters, choose Merge Selected & Re-Cluster.\nAfter a quick double-check, change the Keying Function to the next algorithm.\nRinse and repeat for all the keying functions.\nThen change the Method from “key collision” to “nearest neighbor” and follow all the above steps again.\n\nWith nearest neighbor and levenshtein it might be worth reducing the value in Block Chars to see if there are more matches that help you.\n\n\nFollowing is a gif of me going through a couple of keying functions, merges and new algorithms. I’m not fixing all the values, just showing enough of the process to give you an idea of how it works.\n\n\n\nClustering\n\n\n\nPractice cleaning up City_clean\nAs you cluster and clean data like this, you’ll likely have to do some research and make style decisions (N PROVIDENCE vs NORTH PROVIDENCE? Is it PEACE DALE or PEACEDALE?)\n\nGo through all the algorithms and clean up the city names.\nRemember: Don’t merge unless all values in a cluster should be the same.\nOnce through all the algorithms, double-check through the facet list to see if there are values the algorithms missed. It is quite possible.\n\nYou would typically use text facets on all the text-based columns to check for other inconsistencies.",
+    "text": "Cluster\nOpenRefine has a concept called Cluster that will use algorithms to find similarly-constructed or even similar sounding words. We’ll use a series of these help us clean these city names.\n\nIn the text facet box for City_clean, click on the Cluster button at the top-right. This brings up the Cluster and edit column tool.\nClick on the Cluster button in the middle so we can take a little tour of the options.\n\n\n\nCluster tour\n\n\n\nThe idea here is to work through all the results methodically:\n\nLook through all the values for a particular Keying function.\nIf you want to merge all the values in the cluster, check the Merge box and set the New Cell Value to the desired result.\n\nIf one of the values in the cluster does belong to the new value, then uncheck the box next to that value so it won’t be included.\n\nOnce you’ve reviewed all the clusters, choose Merge Selected & Re-Cluster.\nAfter a quick double-check, change the Keying Function to the next algorithm.\nRinse and repeat for all the keying functions.\nThen change the Method from “key collision” to “nearest neighbor” and follow all the above steps again.\n\nWith nearest neighbor and levenshtein it might be worth reducing the value in Block Chars to see if there are more matches that help you.\n\n\nBelow is a gif of me going through a couple of keying functions, merges and new algorithms. I’m not fixing all the values, just showing enough of the process to give you an idea of how it works.\n\n\n\nClustering\n\n\n\nPractice cleaning up City_clean\nAs you cluster and clean data like this, you’ll likely have to do some research and make style decisions (N PROVIDENCE vs NORTH PROVIDENCE? Is it PEACE DALE or PEACEDALE?)\n\nGo through all the algorithms and clean up the city names.\nOnce through all the algorithms, double-check through the facet list to see if there are values the algorithms missed. It is quite possible.\n\nYou would typically use text facets on all the text-based columns to check for other inconsistencies.",
     "crumbs": [
       "Demos",
       "Clustering"