diff --git a/02_activities/assignments/a1_sampling_and_reproducibility.ipynb b/02_activities/assignments/a1_sampling_and_reproducibility.ipynb index 873f5985..2dfd6eb7 100644 --- a/02_activities/assignments/a1_sampling_and_reproducibility.ipynb +++ b/02_activities/assignments/a1_sampling_and_reproducibility.ipynb @@ -1,215 +1,340 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "ed39f379", - "metadata": {}, - "source": [ - "# Assignment 1: Sampling and Reproducibility\n", - "\n", - "The code at the end of this file explores contact tracing data about an outbreak of the flu, and demonstrates the dangers of incomplete and non-random samples. This assignment is modified from [Contact tracing can give a biased sample of COVID-19 cases](https://andrewwhitby.com/2020/11/24/contact-tracing-biased/) by Andrew Whitby.\n", - "\n", - "Examine the code below. Identify all stages at which sampling is occurring in the model. Describe in words the sampling procedure, referencing the functions used, sample size, sampling frame, any underlying distributions involved. \n" - ] + "cells": [ + { + "cell_type": "markdown", + "id": "ed39f379", + "metadata": { + "id": "ed39f379" + }, + "source": [ + "# Assignment 1: Sampling and Reproducibility\n", + "\n", + "The code at the end of this file explores contact tracing data about an outbreak of the flu, and demonstrates the dangers of incomplete and non-random samples. This assignment is modified from [Contact tracing can give a biased sample of COVID-19 cases](https://andrewwhitby.com/2020/11/24/contact-tracing-biased/) by Andrew Whitby.\n", + "\n", + "Examine the code below. Identify all stages at which sampling is occurring in the model. Describe in words the sampling procedure, referencing the functions used, sample size, sampling frame, any underlying distributions involved.\n" + ] + }, + { + "cell_type": "markdown", + "id": "4ea73db3", + "metadata": { + "id": "4ea73db3" + }, + "source": [] + }, + { + "cell_type": "markdown", + "id": "3d9b2ccc", + "metadata": { + "id": "3d9b2ccc" + }, + "source": [ + "Modify the number of repetitions in the simulation to 10 and 100 (from the original 1000). Run the script multiple times and observe the outputted graphs. Comment on the reproducibility of the results." + ] + }, + { + "cell_type": "markdown", + "id": "4cf5d993", + "metadata": { + "id": "4cf5d993" + }, + "source": [] + }, + { + "cell_type": "markdown", + "id": "32603ce7", + "metadata": { + "id": "32603ce7" + }, + "source": [ + "Alter the code so that it is reproducible. Describe the changes you made to the code and how they affected the reproducibility of the script. The script needs to produce the same output when run multiple times." + ] + }, + { + "cell_type": "markdown", + "id": "77613cc3", + "metadata": { + "id": "77613cc3" + }, + "source": [] + }, + { + "cell_type": "markdown", + "id": "30b4a74f", + "metadata": { + "id": "30b4a74f" + }, + "source": [ + "## Code" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "ab8587a0", + "metadata": { + "id": "ab8587a0" + }, + "outputs": [], + "source": [ + "# Imports\n", + "import pandas as pd\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "import warnings\n", + "warnings.simplefilter(action='ignore', category=FutureWarning)\n", + "\n", + "# Model parameters (constants)\n", + "ATTACK_RATE = 0.10\n", + "TRACE_SUCCESS = 0.20\n", + "SECONDARY_TRACE_THRESHOLD = 2\n", + "\n", + "def simulate_event(rng):\n", + " \"\"\"\n", + " Simulate infection + tracing over two event types: 200 'wedding', 800 'brunch'.\n", + " Returns:\n", + " (p_wedding_infections, p_wedding_traces)\n", + " \"\"\"\n", + " # Build population\n", + " events = ['wedding'] * 200 + ['brunch'] * 800\n", + " ppl = pd.DataFrame({\n", + " 'event': events,\n", + " 'infected': False,\n", + " 'traced': pd.Series([pd.NA]*len(events), dtype=pd.BooleanDtype())\n", + " })\n", + "\n", + " # Infection sampling: exactly 10% infected (SRS without replacement)\n", + " n_infected = int(len(ppl) * ATTACK_RATE) # 100\n", + " infected_indices = rng.choice(ppl.index, size=n_infected, replace=False)\n", + " ppl.loc[infected_indices, 'infected'] = True\n", + "\n", + " # Primary tracing: Bernoulli(TRACE_SUCCESS) among infected\n", + " n_inf = int(ppl['infected'].sum())\n", + " ppl.loc[ppl['infected'], 'traced'] = rng.random(n_inf) < TRACE_SUCCESS\n", + "\n", + " # Secondary tracing: if an event has >= threshold traced infected, trace all infected at that event\n", + " event_trace_counts = ppl[ppl['traced'] == True]['event'].value_counts()\n", + " events_traced = event_trace_counts[event_trace_counts >= SECONDARY_TRACE_THRESHOLD].index\n", + " ppl.loc[ppl['event'].isin(events_traced) & ppl['infected'], 'traced'] = True\n", + "\n", + " # Aggregate proportions\n", + " ppl['event_type'] = ppl['event'].str[0] # 'w' or 'b'\n", + " wedding_infections = ((ppl['infected']) & (ppl['event_type'] == 'w')).sum()\n", + " brunch_infections = ((ppl['infected']) & (ppl['event_type'] == 'b')).sum()\n", + " p_wedding_infections = wedding_infections / (wedding_infections + brunch_infections)\n", + "\n", + " wedding_traces = ((ppl['infected']) & (ppl['traced'] == True) & (ppl['event_type'] == 'w')).sum()\n", + " brunch_traces = ((ppl['infected']) & (ppl['traced'] == True) & (ppl['event_type'] == 'b')).sum()\n", + " p_wedding_traces = wedding_traces / (wedding_traces + brunch_traces) if (wedding_traces + brunch_traces) > 0 else np.nan\n", + "\n", + " return p_wedding_infections, p_wedding_traces\n", + "\n", + "def run_simulation(REPS, rng):\n", + " results = [simulate_event(rng) for _ in range(REPS)]\n", + " props_df = pd.DataFrame(results, columns=[\"Infections\", \"Traces\"])\n", + " plt.figure(figsize=(10, 6))\n", + " sns.histplot(props_df['Infections'], color=\"blue\", alpha=0.75, binwidth=0.05, kde=False, label='Infections from Weddings')\n", + " sns.histplot(props_df['Traces'], color=\"red\", alpha=0.75, binwidth=0.05, kde=False, label='Traced to Weddings')\n", + " plt.xlabel(\"Proportion of cases\")\n", + " plt.ylabel(\"Frequency\")\n", + " plt.title(f\"Impact of Contact Tracing on Perceived Flu Infection Sources (REPS={REPS})\")\n", + " plt.legend()\n", + " plt.tight_layout()\n", + " plt.show()\n", + " return props_df" + ] + }, + { + "cell_type": "code", + "source": [ + "# Non-reproducible runs (RNG seeded from entropy each time)\n", + "rng1 = np.random.default_rng() # no fixed seed\n", + "props_df_10 = run_simulation(10, rng1)\n", + "\n", + "rng2 = np.random.default_rng() # fresh RNG again\n", + "props_df_100 = run_simulation(100, rng2)" + ], + "metadata": { + "id": "bikJJ5yCsJ6P", + "outputId": "e494636b-922f-4bd2-89d9-4c85269e49eb", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + } + }, + "id": "bikJJ5yCsJ6P", + "execution_count": 4, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "Repetitions: I ran the simulation with REPS = 10 and REPS = 100 using a fresh RNG each time.\n", + "Observation: Each time I re-ran the notebook, the histograms changed because the random infection assignment and primary tracing step produce different samples on each run. With REPS = 10, the results varied widely. With REPS = 100, the distributions looked more stable, but still not identical run-to-run due to randomness.\n", + "Conclusion: Without controlling the random number generator, the results are not reproducible." + ], + "metadata": { + "id": "QrvhhG7ZsOar" + }, + "id": "QrvhhG7ZsOar" + }, + { + "cell_type": "markdown", + "source": [ + "Change for reproducibility: I introduced a fixed random seed and used the modern numpy.random.Generator API throughout. This ensures the infection sampling and tracing decisions are generated from the same seeded sequence each time, producing identical results when the notebook/script is re-run from top to bottom.\n", + "Concretely:\n", + "\n", + "Replaced np.random.choice / np.random.rand with rng.choice / rng.random.\n", + "Created rng = np.random.default_rng(42) before running simulations.\n", + "Ensured we do not re-seed within the loop.\n", + "Effect: Re-running the notebook yields the exact same figures and numbers.\n", + "\n", + "\n" + ], + "metadata": { + "id": "qv6bDlX2sYIA" + }, + "id": "qv6bDlX2sYIA" + }, + { + "cell_type": "code", + "source": [ + "# Reproducible runs (fixed seed)\n", + "RNG_SEED = 42\n", + "rng_fixed_10 = np.random.default_rng(RNG_SEED)\n", + "props_df_10_r = run_simulation(10, rng_fixed_10)\n", + "\n", + "rng_fixed_100 = np.random.default_rng(RNG_SEED)\n", + "props_df_100_r = run_simulation(100, rng_fixed_100)" + ], + "metadata": { + "id": "sF0ABvSIse_r", + "outputId": "b027f4dd-e760-418f-a567-1fd8a8ccac5d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + } + }, + "id": "sF0ABvSIse_r", + "execution_count": 6, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "id": "f418c720", + "metadata": { + "id": "f418c720" + }, + "source": [ + "## Criteria" + ] + }, + { + "cell_type": "markdown", + "id": "c0b3f93f", + "metadata": { + "id": "c0b3f93f" + }, + "source": [ + "|Criteria|Complete|Incomplete|\n", + "|--------|----|----|\n", + "|Alteration of the code|The code changes made, made it reproducible.|The code is still not reproducible.|\n", + "|Description of changes|The author answered questions and explained the reasonings for the changes made well.|The author did not answer questions or explain the reasonings for the changes made well.|" + ] + }, + { + "cell_type": "markdown", + "id": "83cec589", + "metadata": { + "id": "83cec589" + }, + "source": [ + "## Submission Information\n", + "🚨 **Please review our [Assignment Submission Guide](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md)** 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly.\n", + "\n", + "### Submission Parameters:\n", + "* Submission Due Date: `23:59 - 02 February 2026`\n", + "* The branch name for your repo should be: `assignment-1`\n", + "* What to submit for this assignment:\n", + " * This markdown file (`a1_sampling_and_reproducibility.ipynb`) should be populated with the code changed.\n", + "* What the pull request link should look like for this assignment: `https://github.com//sampling/pull/`\n", + " * Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support staff review your submission easily.\n", + "\n", + "#### Checklist:\n", + "- [ ] Create a branch called `assignment-1`.\n", + "- [ ] Ensure that the repository is public.\n", + "- [ ] Review [the PR description guidelines](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md#guidelines-for-pull-request-descriptions) and adhere to them.\n", + "- [ ] Verify that the link is accessible in a private browser window.\n", + "\n", + "If you encounter any difficulties or have questions, please don't hesitate to reach out to our team via the help channel in Slack. Our Technical Facilitators and Learning Support staff are here to help you navigate any challenges.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.0" + }, + "colab": { + "provenance": [] + } }, - { - "cell_type": "markdown", - "id": "4ea73db3", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "id": "3d9b2ccc", - "metadata": {}, - "source": [ - "Modify the number of repetitions in the simulation to 10 and 100 (from the original 1000). Run the script multiple times and observe the outputted graphs. Comment on the reproducibility of the results." - ] - }, - { - "cell_type": "markdown", - "id": "4cf5d993", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "id": "32603ce7", - "metadata": {}, - "source": [ - "Alter the code so that it is reproducible. Describe the changes you made to the code and how they affected the reproducibility of the script. The script needs to produce the same output when run multiple times." - ] - }, - { - "cell_type": "markdown", - "id": "77613cc3", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "id": "30b4a74f", - "metadata": {}, - "source": [ - "## Code" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ab8587a0", - "metadata": {}, - "outputs": [], - "source": [ - "# Import necessary libraries\n", - "import pandas as pd\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "\n", - "# Note: Suppressing FutureWarnings to maintain a clean output. This is specifically to ignore warnings about\n", - "# deprecated features in the libraries we're using (e.g., 'use_inf_as_na' option in Pandas, used by Seaborn),\n", - "# which we currently have no direct control over. This action is taken to ensure that our output remains\n", - "# focused on relevant information, acknowledging that we rely on external library updates to fully resolve\n", - "# these deprecations. Always consider reviewing and removing this suppression after significant library updates.\n", - "import warnings\n", - "warnings.simplefilter(action='ignore', category=FutureWarning)\n", - "\n", - "# Constants representing the parameters of the model\n", - "ATTACK_RATE = 0.10\n", - "TRACE_SUCCESS = 0.20\n", - "SECONDARY_TRACE_THRESHOLD = 2\n", - "\n", - "def simulate_event(m):\n", - " \"\"\"\n", - " Simulates the infection and tracing process for a series of events.\n", - " \n", - " This function creates a DataFrame representing individuals attending weddings and brunches,\n", - " infects a subset of them based on the ATTACK_RATE, performs primary and secondary contact tracing,\n", - " and calculates the proportions of infections and traced cases that are attributed to weddings.\n", - " \n", - " Parameters:\n", - " - m: Dummy parameter for iteration purposes.\n", - " \n", - " Returns:\n", - " - A tuple containing the proportion of infections and the proportion of traced cases\n", - " that are attributed to weddings.\n", - " \"\"\"\n", - " # Create DataFrame for people at events with initial infection and traced status\n", - " events = ['wedding'] * 200 + ['brunch'] * 800\n", - " ppl = pd.DataFrame({\n", - " 'event': events,\n", - " 'infected': False,\n", - " 'traced': np.nan # Initially setting traced status as NaN\n", - " })\n", - "\n", - " # Explicitly set 'traced' column to nullable boolean type\n", - " ppl['traced'] = ppl['traced'].astype(pd.BooleanDtype())\n", - "\n", - " # Infect a random subset of people\n", - " infected_indices = np.random.choice(ppl.index, size=int(len(ppl) * ATTACK_RATE), replace=False)\n", - " ppl.loc[infected_indices, 'infected'] = True\n", - "\n", - " # Primary contact tracing: randomly decide which infected people get traced\n", - " ppl.loc[ppl['infected'], 'traced'] = np.random.rand(sum(ppl['infected'])) < TRACE_SUCCESS\n", - "\n", - " # Secondary contact tracing based on event attendance\n", - " event_trace_counts = ppl[ppl['traced'] == True]['event'].value_counts()\n", - " events_traced = event_trace_counts[event_trace_counts >= SECONDARY_TRACE_THRESHOLD].index\n", - " ppl.loc[ppl['event'].isin(events_traced) & ppl['infected'], 'traced'] = True\n", - "\n", - " # Calculate proportions of infections and traces attributed to each event type\n", - " ppl['event_type'] = ppl['event'].str[0] # 'w' for wedding, 'b' for brunch\n", - " wedding_infections = sum(ppl['infected'] & (ppl['event_type'] == 'w'))\n", - " brunch_infections = sum(ppl['infected'] & (ppl['event_type'] == 'b'))\n", - " p_wedding_infections = wedding_infections / (wedding_infections + brunch_infections)\n", - "\n", - " wedding_traces = sum(ppl['infected'] & ppl['traced'] & (ppl['event_type'] == 'w'))\n", - " brunch_traces = sum(ppl['infected'] & ppl['traced'] & (ppl['event_type'] == 'b'))\n", - " p_wedding_traces = wedding_traces / (wedding_traces + brunch_traces)\n", - "\n", - " return p_wedding_infections, p_wedding_traces\n", - "\n", - "# Run the simulation 1000 times\n", - "results = [simulate_event(m) for m in range(1000)]\n", - "props_df = pd.DataFrame(results, columns=[\"Infections\", \"Traces\"])\n", - "\n", - "# Plotting the results\n", - "plt.figure(figsize=(10, 6))\n", - "sns.histplot(props_df['Infections'], color=\"blue\", alpha=0.75, binwidth=0.05, kde=False, label='Infections from Weddings')\n", - "sns.histplot(props_df['Traces'], color=\"red\", alpha=0.75, binwidth=0.05, kde=False, label='Traced to Weddings')\n", - "plt.xlabel(\"Proportion of cases\")\n", - "plt.ylabel(\"Frequency\")\n", - "plt.title(\"Impact of Contact Tracing on Perceived Flu Infection Sources\")\n", - "plt.legend()\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "id": "f418c720", - "metadata": {}, - "source": [ - "## Criteria" - ] - }, - { - "cell_type": "markdown", - "id": "c0b3f93f", - "metadata": {}, - "source": [ - "|Criteria|Complete|Incomplete|\n", - "|--------|----|----|\n", - "|Alteration of the code|The code changes made, made it reproducible.|The code is still not reproducible.|\n", - "|Description of changes|The author answered questions and explained the reasonings for the changes made well.|The author did not answer questions or explain the reasonings for the changes made well.|" - ] - }, - { - "cell_type": "markdown", - "id": "83cec589", - "metadata": {}, - "source": [ - "## Submission Information\n", - "🚨 **Please review our [Assignment Submission Guide](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md)** 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly.\n", - "\n", - "### Submission Parameters:\n", - "* Submission Due Date: `23:59 - 02 February 2026`\n", - "* The branch name for your repo should be: `assignment-1`\n", - "* What to submit for this assignment:\n", - " * This markdown file (`a1_sampling_and_reproducibility.ipynb`) should be populated with the code changed.\n", - "* What the pull request link should look like for this assignment: `https://github.com//sampling/pull/`\n", - " * Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support staff review your submission easily.\n", - "\n", - "#### Checklist:\n", - "- [ ] Create a branch called `assignment-1`.\n", - "- [ ] Ensure that the repository is public.\n", - "- [ ] Review [the PR description guidelines](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md#guidelines-for-pull-request-descriptions) and adhere to them.\n", - "- [ ] Verify that the link is accessible in a private browser window.\n", - "\n", - "If you encounter any difficulties or have questions, please don't hesitate to reach out to our team via the help channel in Slack. Our Technical Facilitators and Learning Support staff are here to help you navigate any challenges.\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.13.0" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/02_activities/assignments/a2_survey_design_and_evaluation.md b/02_activities/assignments/a2_survey_design_and_evaluation.md index b4f036f2..226bbc09 100644 --- a/02_activities/assignments/a2_survey_design_and_evaluation.md +++ b/02_activities/assignments/a2_survey_design_and_evaluation.md @@ -21,19 +21,34 @@ Select one of the scenarios below and design a survey to meet the need(s) outlin For the **Canadian General Social Survey on Giving, Volunteering, and Participating, 2018 (cycle 33)**, conducted by Statistics Canada find any and all available documentation for the data gathered and identify and describe the survey features indicated below. -1. Sample type -2. Sample size -3. Target population -4. Sampling frame -5. Survey mode(s) -6. Timeline -7. Response rate -8. Weights -9. Data processing -10. Cleaning, imputation, etc -11. Sources of error -12. Limitations, known biases, etc +1. Sample type : The General Social Survey uses a probability sample selected across the ten provinces. +2. Sample size : The 2018 GSS GVP contains 16,149 observations. +3. Target population : Individuals aged 15 and over living in private households in Canada’s ten provinces, excluding residents of territories and institutions. +4. Sampling frame : A sampling frame that includes Canadian residents in private households in the 10 provinces, derived from Statistics Canada's household sampling system. (Frame implied from target population description.) +5. Survey mode(s) : The survey incorporated online reporting as part of updated questionnaire delivery in 2018. +(Statistics Canada GSS surveys commonly combine online and interviewer‑administered modes.) +6. Timeline : The 2018 GSS GVP was conducted from September to December 2018. +7. Response rate : The publicly accessible documentation snippets in the search results do not explicitly mention the numeric response rate. (The user guide available in the PUMF typically contains this exact figure.) +8. Weights : The PUMF includes survey weights and estimation procedures designed to ensure population‑representative estimates, described in the microdata user guide. +9. Data processing : Statistics Canada’s GSS materials describe processing steps including coding, validation, microdata anonymization, and estimation preparation as part of the PUMF package documentation. +10. Cleaning, imputation, etc : The documentation notes updates, revisions to questions, anonymization, and quality-control processes prior to release; however, specific imputation details are only accessible in the full user guide referenced. +11. Sources of error : Potential sources include: +Sampling error due to probabilistic household sampling. +Non‑sampling errors from questionnaire revisions and online transition +12. Limitations, known biases, etc : Limitations include: + +Exclusion of the territories. +Exclusion of institutionalized populations. +Possible measurement variation due to revised and updated question wording in 2018 13. Link to documentation and any additional sources used +Statistics Canada PUMF Documentation (Cycle 33): +https://www150.statcan.gc.ca/n1/en/catalogue/45250011 [www150.statcan.gc.ca] +Borealis Data Repository (microdata): +https://doi.org/10.5683/SP3/U1AYY0 [borealisdata.ca] +Abacus Data Network (user guide & questionnaire): +https://hdl.handle.net/11272.1/AB2/GBFDYG [abacus.lib...ary.ubc.ca] +Daily release notice: +https://www150.statcan.gc.ca/n1/daily-quotidien/210126/dq210126h-eng.htm [www150.statcan.gc.ca] # Your Changes @@ -41,38 +56,75 @@ For the **Canadian General Social Survey on Giving, Volunteering, and Participat ## Part A - Survey Design: The number of your chosen topic: `#` - +3 Describe the purpose of your survey: -``` +``` -> number 3 is one i choose , The purpose of this survey is to understand how age influences music taste, with a particular focus on how individuals perceive popular music at different stages in their life. The results will help identify whether music taste evolves with age and what factors contribute to these changes. + write your answer here... ``` Describe your target population, sampling frame, sampling units, and observational units: ``` -write your answer here... -``` +-> Target Population: +All individuals aged 15 and older currently residing in Canada. + +Sampling Frame: +A list of currently enrolled University of Toronto students along with a purchased panel list of Canadian adults from a reputable survey research firm. + +Sampling Units: +Individual persons selected from the sampling frame. + +Observational Units: +The same individuals who complete the survey and provide their perceptions of music tast + Your 5-10 question survey: ``` -1. write your question here... -2. write your question here... -3. write your question here... -4. write your question here... -5. write your question here... -6. write your question here... (optional) -7. write your question here... (optional) -8. write your question here... (optional) -9. write your question here... (optional) -10. write your question here... (optional) -``` +survey questions : +1. How old are you? (Open response or age brackets) +2. How often do you listen to popular music? (Daily, Weekly, Monthly, Rarely, Never) +3. Which genres of music do you currently enjoy the most? (Select all that apply) +4. Thinking back 5–10 years, how would you describe your music taste at that time compared to now? (Very similar / Somewhat similar / Very different) +5. Do you believe your age has influenced the type of music you enjoy? (Yes / No / Unsure) +6. How important is staying updated with new music releases to you? (1–5 Likert scale) +7. Do you find that your perception of “popular music” changes as you grow older? (Yes / No / Unsure) +8. How strongly do you associate music with specific life stages or memories? (1–5 Likert scale) +9. Which factors most influence your music preferences today? (Friends, Family, Social Media, Streaming Algorithms, Nostalgia, Other) +10. Would you be willing to participate in a follow‑up interview? (Yes / No) ## Part B - Survey Evaluation: Identify and describe survey features: -``` -write your answer here -``` +1. Sample type : The General Social Survey uses a probability sample selected across the ten provinces. +2. Sample size : The 2018 GSS GVP contains 16,149 observations. +3. Target population : Individuals aged 15 and over living in private households in Canada’s ten provinces, excluding residents of territories and institutions. +4. Sampling frame : A sampling frame that includes Canadian residents in private households in the 10 provinces, derived from Statistics Canada's household sampling system. (Frame implied from target population description.) +5. Survey mode(s) : The survey incorporated online reporting as part of updated questionnaire delivery in 2018. +(Statistics Canada GSS surveys commonly combine online and interviewer‑administered modes.) +6. Timeline : The 2018 GSS GVP was conducted from September to December 2018. +7. Response rate : The publicly accessible documentation snippets in the search results do not explicitly mention the numeric response rate. (The user guide available in the PUMF typically contains this exact figure.) +8. Weights : The PUMF includes survey weights and estimation procedures designed to ensure population‑representative estimates, described in the microdata user guide. +9. Data processing : Statistics Canada’s GSS materials describe processing steps including coding, validation, microdata anonymization, and estimation preparation as part of the PUMF package documentation. +10. Cleaning, imputation, etc : The documentation notes updates, revisions to questions, anonymization, and quality-control processes prior to release; however, specific imputation details are only accessible in the full user guide referenced. +11. Sources of error : Potential sources include: +Sampling error due to probabilistic household sampling. +Non‑sampling errors from questionnaire revisions and online transition +12. Limitations, known biases, etc : Limitations include: + +Exclusion of the territories. +Exclusion of institutionalized populations. +Possible measurement variation due to revised and updated question wording in 2018 +13. Link to documentation and any additional sources used +Statistics Canada PUMF Documentation (Cycle 33): +https://www150.statcan.gc.ca/n1/en/catalogue/45250011 [www150.statcan.gc.ca] +Borealis Data Repository (microdata): +https://doi.org/10.5683/SP3/U1AYY0 [borealisdata.ca] +Abacus Data Network (user guide & questionnaire): +https://hdl.handle.net/11272.1/AB2/GBFDYG [abacus.lib...ary.ubc.ca] +Daily release notice: +https://www150.statcan.gc.ca/n1/daily-quotidien/210126/dq210126h-eng.htm [www150.statcan.gc.ca] + ## Rubric