diff --git a/02_activities/assignments/a1_sampling_and_reproducibility.ipynb b/02_activities/assignments/a1_sampling_and_reproducibility.ipynb index 11852458..8601d878 100644 --- a/02_activities/assignments/a1_sampling_and_reproducibility.ipynb +++ b/02_activities/assignments/a1_sampling_and_reproducibility.ipynb @@ -16,7 +16,25 @@ "cell_type": "markdown", "id": "4ea73db3", "metadata": {}, - "source": [] + "source": [ + "There are multiple stages of sampling occuring within the model. \n", + "\n", + "First we are sampling who is becoming infected (the 'attack rate'). This happened at # Infect a random subset of people step.\n", + "- The sampling procedure: np.random.choice(..., replace=False) selects 100 unique people uniformly at random from the 1000 attendees.\n", + "- Function used: the function used was np.random.choice(..., replace=False)\n", + "- Sample size: 1000 = sample size\n", + "- Sample frame: all 1000 attendees (ppl.index), i.e., everyone at weddings + brunches.\n", + "- Underlying distrubution: a simple random sample without replacement\n", + "\n", + "Second we are sampling which infected people are successfully traced (primary contact tracing).\n", + "- The sampling procedure: np.random.rand(n) < 0.20 marks each infected person as traced with probability 0.20.\n", + "- Function used: np.random.rand(.....)\n", + "- Sample size: one random draw per infected person (so ~100 draws)\n", + "- Sample frame: only the infected individuals (about 100 each run).\n", + "- Underlying distrubution:a simple random sample without replacement\n", + "\n", + "There is a secondary tracing happening and the Monte Carlo sampling across repititions is also happening but it is not as explicity broken up to the components listed above but because they are not randomized but help us to predict based on the defined model. " + ] }, { "cell_type": "markdown", @@ -30,7 +48,9 @@ "cell_type": "markdown", "id": "4cf5d993", "metadata": {}, - "source": [] + "source": [ + "When I changed the number of repetitions in the simulation to 10 the distributions of the graphs vary from run-to-run. With 100 repetitions instead the graph distributions become more stable, but still vary noticeably run-to-run as compared to the original 1000. This shows that the results are not directly reproducible due to the randomness based on each run." + ] }, { "cell_type": "markdown", @@ -44,7 +64,9 @@ "cell_type": "markdown", "id": "77613cc3", "metadata": {}, - "source": [] + "source": [ + "I made the simulation reproducible by fixing the random number generator seed using np.random.seed(123) before any random sampling occurred. This causes the random functions (np.random.choice for selecting infected individuals and np.random.rand for tracing success) to produce the same sequence of random values each time the script is run. As a result, the same people are infected and the same tracing outcomes occur in each repetition, so the output DataFrame and histogram plots are identical across multiple runs." + ] }, { "cell_type": "markdown", @@ -56,10 +78,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 17, "id": "ab8587a0", "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "# Import necessary libraries\n", "import pandas as pd\n", @@ -80,6 +113,9 @@ "TRACE_SUCCESS = 0.20\n", "SECONDARY_TRACE_THRESHOLD = 2\n", "\n", + "#Fix randomness\n", + "np.random.seed(123)\n", + "\n", "def simulate_event(m):\n", " \"\"\"\n", " Simulates the infection and tracing process for a series of events.\n", @@ -95,6 +131,7 @@ " - A tuple containing the proportion of infections and the proportion of traced cases\n", " that are attributed to weddings.\n", " \"\"\"\n", + " \n", " # Create DataFrame for people at events with initial infection and traced status\n", " events = ['wedding'] * 200 + ['brunch'] * 800\n", " ppl = pd.DataFrame({\n", @@ -131,7 +168,7 @@ " return p_wedding_infections, p_wedding_traces\n", "\n", "# Run the simulation 1000 times\n", - "results = [simulate_event(m) for m in range(1000)]\n", + "results = [simulate_event(m) for m in range(100)]\n", "props_df = pd.DataFrame(results, columns=[\"Infections\", \"Traces\"])\n", "\n", "# Plotting the results\n", @@ -193,7 +230,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "sampling-env (3.11.13)", "language": "python", "name": "python3" }, @@ -207,7 +244,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.13.0" + "version": "3.11.13" } }, "nbformat": 4,