generated from hackforla/.github-hackforla-base-repo-template
    
        
        - 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 19
311 Data Notebook
        Bonnie Wolfe edited this page Oct 14, 2025 
        ·
        1 revision
      
    The 311 service request dataset is very large and challenging to host or query in in-browser environments. To make this data more accessible and usable, the project aims to process, clean, and split it into manageable files, enabling users to work with the data efficiently.
- Python & Jupyter Notebook – for data cleaning and transformation
- Pandas – for processing large datasets efficiently
- Google Colab – for running the data pipeline in-browser, processing datasets, and providing temporary access to cleaned files
This project builds a reproducible pipeline that:
- Downloads raw 311 Service Request data
- Cleans the dataset according to standardized rules for consistency and quality
- Splits the data by year, then by month, with each file around 100MB in size
- Provides cleaned and split datasets via Colab for direct download by users (instead of publishing large datasets directly to GitHub)
- Downloaded 311 Service Request data from the city’s open data portal
- Users can dynamically select the year they want to process and download in the notebook. (Refer to the notebook for the code snippet that maps each year to the corresponding CSV URL.)
High-level steps performed:
- Removed duplicates
- Handled missing values
- Standardized date fields
- Reviewed and simplified categorical variables
- Dropped unnecessary columns
- Converted text columns to lowercase
- Cleaned and validated geographical data
- Partitioned and saved cleaned dataset into monthly files
- Partitioned cleaned datasets by year, then by month
- Organized files in a clear folder hierarchy for easy access
- Documented notebook automates:
- Data download (with dynamic year selection)
- Cleaning rules
- Splitting logic
- Saving outputs
 
- Includes annotations explaining each step
Google Colab lets you run this project in your browser without installing anything on your computer.
- Open the link above and sign in with your Google account.
- (Optional) Click the "Connect" button in the top right corner to start a runtime. Alternatively, running any cell or using "Run all" will automatically connect.
- Run the notebook cells:
- To run all cells automatically, use "Runtime" → "Run all" from the top menu.
- To run cells one by one, press Shift + Enter or click the play button next to each cell.
 
- The notebook will:
- Download raw 311 data (you can select the year to process)
- Apply cleaning rules
- Split files by year and month
 
- Download the resulting files by opening the "Files" tab in the left sidebar, right-clicking the CSV files, and choosing "Download".
Note: Files exist only during your Colab session. Be sure to download anything you need before closing the session.
For instructions on cloning the repo, installing dependencies, and running the notebook on your machine, please see the project’s README.
- Annotated Jupyter Notebook with the full data pipeline
- Cleaned and partitioned datasets available via Colab runtime
- Cleaning Rules Documentation
- Project README
- Related GitHub Issue
- Older 311 data can be found at: LA City Open Data Portal