Complete assignment 2 by tanveerrouf · Pull Request #2 · tanveerrouf/python

tanveerrouf · 2025-10-28T00:18:17Z

TITLE: UofT-DSI | Python - Assignment 2

What changes are you trying to make?

Built a complete data analysis pipeline to evaluate arthritis drug efficacy across a 12-session clinical trial. Created three main components:

Data Reading: Loaded and displayed inflammation data from 12 CSV files (60 patients × 40 days each)
patient_summary() Function: Implemented NumPy-based function to compute mean, max, and min inflammation scores across 40-day periods for all 60 patients
detect_problems() Function: Built error detection system to identify data anomalies (patients with zero mean inflammation, indicating potential data entry errors or ineligible participants)

All code uses NumPy for efficient array operations rather than loops, with proper axis specification for row-wise calculations.

What did you learn from the changes you have made?

NumPy axis mechanics: Understanding how axis=1 operates across rows (patients) vs columns (days) is critical for correct computation
Data validation importance: Detecting zero-mean values caught potential data quality issues that could skew efficacy analysis
Function modularity: Building patient_summary() as a reusable function made detect_problems() implementation cleaner and more maintainable
CSV handling: Working with file paths and numpy.loadtxt() with delimiters showed practical file I/O in data analysis workflows

Was there another approach you were thinking about making?

Could have used pandas DataFrames instead of NumPy arrays for more intuitive column/row selection, but NumPy was more efficient and aligned with course focus. Also considered using loops with native Python to calculate statistics, but NumPy vectorization is significantly faster for 60 × 40 datasets.

Were there any challenges?

Main challenge: Understanding the axis parameter in NumPy functions. Initially wasn't clear whether axis=0 or axis=1 operated on patients vs days. Resolved by testing on the first file and verifying output shape matched 60 patients.

Secondary challenge: Interpreting the check_zeros() helper function's logic with np.where() and checking if the resulting flag was empty. Clarified by working through the function step-by-step.

How were these changes tested?

Tested patient_summary() on first inflammation file, verified output length equals 60 (one per patient)
Ran all three operations ('mean', 'max', 'min') to confirm correct behavior
Tested detect_problems() on first file, confirmed False output (no patients with zero mean inflammation)
Verified error handling in patient_summary() with invalid operation parameter

Checklist

I can confirm that my changes are working as intended
All code cells execute without errors
Functions return expected output shapes and values
Code is organized with clear comments explaining NumPy operations

Dmytro-Bonislavskyi

Well done!

Complete assignment 2

7b22ebf

Dmytro-Bonislavskyi approved these changes Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Complete assignment 2#2

Complete assignment 2#2
tanveerrouf wants to merge 1 commit intomainfrom
assignment-2

tanveerrouf commented Oct 28, 2025

Uh oh!

Dmytro-Bonislavskyi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

tanveerrouf commented Oct 28, 2025

What changes are you trying to make?

What did you learn from the changes you have made?

Was there another approach you were thinking about making?

Were there any challenges?

How were these changes tested?

Checklist

Uh oh!

Dmytro-Bonislavskyi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants