diff --git a/02_activities/assignments/a1_sampling_and_reproducibility.ipynb b/02_activities/assignments/a1_sampling_and_reproducibility.ipynb index 11852458..19e3531c 100644 --- a/02_activities/assignments/a1_sampling_and_reproducibility.ipynb +++ b/02_activities/assignments/a1_sampling_and_reproducibility.ipynb @@ -16,7 +16,28 @@ "cell_type": "markdown", "id": "4ea73db3", "metadata": {}, - "source": [] + "source": [ + "Stage 1: Random infection selection\n", + "- Function used: np.random.choice()\n", + "- Sampling procedure: Simple sampling without replacement\n", + "- Sample size: 10% \n", + "- Sampling frame: 1000 attendees \n", + "- Distribution: Uniform random distribution\n", + "\n", + "Stage 2: Primary contact tracing\n", + "- Function used: np.random.rand() comparing with TRACE_SUCCESS threshold\n", + "- Sampling procedure: Bernoulli sampling\n", + "- Sample size: 20% \n", + "- Sampling frame: only infected people\n", + "- Distribution: Bernoulli distribution\n", + "\n", + "Stage 3: Secondary contact tracing\n", + "- Function used: Conditional selection based on `event_trace_counts`: \"event_trace_counts = ppl[ppl['traced'] == True]['event'].value_counts()\"\n", + "- Sampling procedure: Threshold-based\n", + "- Sample size: All infected individuals \n", + "- Sampling frame: Both events, wedding and brunch, that reach the threshold\n", + "- Distribution: Based on primary contact tracing" + ] }, { "cell_type": "markdown", @@ -30,7 +51,11 @@ "cell_type": "markdown", "id": "4cf5d993", "metadata": {}, - "source": [] + "source": [ + "When the number of repetitions is few (for example 10), the reproducibility of the results is low because they are highly variable due to the estimates varying from the attack rate. The distribution is also more noisy and unstable\n", + "\n", + "When the number of repetitions is bigger (for example 100), the reproducibility of the results is high and the estimates concentrate closer to the true value, reducing variability, smoother sampling distribution, and more reliable results. " + ] }, { "cell_type": "markdown", @@ -44,7 +69,9 @@ "cell_type": "markdown", "id": "77613cc3", "metadata": {}, - "source": [] + "source": [ + "I added a random seed of an arbitrary number (e.g. 50), to ensure that each time the script runs, it uses the same sequence of random numbers to produce the same output regardless of how many times it runs. " + ] }, { "cell_type": "markdown", @@ -56,10 +83,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "id": "ab8587a0", "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "# Import necessary libraries\n", "import pandas as pd\n", @@ -80,6 +118,9 @@ "TRACE_SUCCESS = 0.20\n", "SECONDARY_TRACE_THRESHOLD = 2\n", "\n", + "# Set random seed for reproducibility\n", + "np.random.seed(50)\n", + "\n", "def simulate_event(m):\n", " \"\"\"\n", " Simulates the infection and tracing process for a series of events.\n", @@ -193,7 +234,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "lcr-env", "language": "python", "name": "python3" }, @@ -207,7 +248,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.13.0" + "version": "3.11.7" } }, "nbformat": 4,