From 6aa045547deb6693adb6899570006d5c4f4c569c Mon Sep 17 00:00:00 2001
From: Kanza <kanza.tariq@mail.utoronto.ca>
Date: Sun, 4 Jan 2026 21:55:42 -0500
Subject: [PATCH] updated assignment

---
 .../a1_sampling_and_reproducibility.ipynb     | 34 +++++++++++++++----
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/02_activities/assignments/a1_sampling_and_reproducibility.ipynb b/02_activities/assignments/a1_sampling_and_reproducibility.ipynb
index 11852458..9b098673 100644
--- a/02_activities/assignments/a1_sampling_and_reproducibility.ipynb
+++ b/02_activities/assignments/a1_sampling_and_reproducibility.ipynb
@@ -16,7 +16,11 @@
    "cell_type": "markdown",
    "id": "4ea73db3",
    "metadata": {},
-   "source": []
+   "source": [
+    "#Stage 1. Sampling who becomes infected: This models the infection process by randomly selecting 100 people to become infected, assuming each person has an equal chance of infection. the function used is : np.random.choice(). The sample size is a 100 individuals while the smapling frame is a 1000 individuals. the distribution is simple random sample without replacement. \n",
+    "Stage 2. This models the chance that an infected person is successfully traced during primary contact tracing. the function used is np.random.rand(n).the sample size and the sampling frame are the infected individuals. \n",
+    "Stage 3. Sampling through secondary contact tracing. If an event has at least 2 traced cases, then all infected people at that event are marked as traced. Sampling frame is events and sample size is the number of events with 2 or more traced individuals. the distribution is not random at this step."
+   ]
   },
   {
    "cell_type": "markdown",
@@ -30,7 +34,9 @@
    "cell_type": "markdown",
    "id": "4cf5d993",
    "metadata": {},
-   "source": []
+   "source": [
+    "When you lower the number of simulation repetitions from 1000 to 100 or even 10, the results become much less stable and much harder to reproduce. With only 10 repetitions, the histograms look rough and change a lot every time you run the code because a few random values can completely shift the shape. At 100 repetitions, things start to look a bit more consistent, but the graphs still vary from run to run, and outliers can still have a noticeable effect. Once you reach 1000 repetitions, the histograms become smooth and almost identical each time you rerun the simulation because the random noise averages out. Overall, reproducibility improves as the number of repetitions increases. Larger sample sizes simply reduce the impact of randomness."
+   ]
   },
   {
    "cell_type": "markdown",
@@ -44,7 +50,9 @@
    "cell_type": "markdown",
    "id": "77613cc3",
    "metadata": {},
-   "source": []
+   "source": [
+    "I added the line: np.random.seed(42). Setting a random seed makes NumPy start from the same random point every time you run the script, which means the simulation behaves the same way on each run. The same people get infected, the same group gets traced, and the same proportions show up in the results, so the histograms end up looking identical every time. Once the seed is set, running the script again and again gives you the exact same output, and the distributions for “Infections” and “Traces” stop shifting around. "
+   ]
   },
   {
    "cell_type": "markdown",
@@ -59,7 +67,18 @@
    "execution_count": null,
    "id": "ab8587a0",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "ename": "",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31mRunning cells with 'sampling-env (3.11.13) (Python 3.11.13)' requires the ipykernel package.\n",
+      "\u001b[1;31mInstall 'ipykernel' into the Python environment. \n",
+      "\u001b[1;31mCommand: 'c:/Users/hhnyn/OneDrive/DSI/sampling/sampling-env/Scripts/python.exe -m pip install ipykernel -U --force-reinstall'"
+     ]
+    }
+   ],
    "source": [
     "# Import necessary libraries\n",
     "import pandas as pd\n",
@@ -75,6 +94,9 @@
     "import warnings\n",
     "warnings.simplefilter(action='ignore', category=FutureWarning)\n",
     "\n",
+    "# Set seed for reproducibility \n",
+    "np.random.seed(42)\n",
+    "\n",
     "# Constants representing the parameters of the model\n",
     "ATTACK_RATE = 0.10\n",
     "TRACE_SUCCESS = 0.20\n",
@@ -193,7 +215,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "sampling-env (3.11.13)",
    "language": "python",
    "name": "python3"
   },
@@ -207,7 +229,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.13.0"
+   "version": "3.11.13"
   }
  },
  "nbformat": 4,