diff --git a/File Upload Service/app/CONFIG_DOCUMENTATION.md b/File Upload Service/app/CONFIG_DOCUMENTATION.md new file mode 100644 index 0000000..e9340d7 --- /dev/null +++ b/File Upload Service/app/CONFIG_DOCUMENTATION.md @@ -0,0 +1,252 @@ + +# Configuration File (`config.yaml`) Documentation + +This document explains each configuration option in your YAML file, its purpose, accepted values, and how it affects preprocessing. + +--- + +## 1. Tabular Section + +```yaml +tabular: + path: data/sample.csv + output_folder: output/tabular + type: csv + preprocessing: + add_row_id: true + categorical_encoding: + columns: [gender, city] + method: onehot + cleaning: + lowercase: true + remove_special_chars: false + trim_strings: true + column_filtering: + drop: [] + keep: [age, income, gender, city] + drop_duplicates: true + dtype_conversion: + age: int + gender: category + income: float + missing_values: + columns: + age: 30 + name: Unknown + global_fill: null + normalization: + columns: [age, income] + method: minmax + outlier_removal: + columns: [age, income] + method: iqr + threshold: 1.5 + remove_empty_columns: true + rename_columns: + oldName: new_name + productID: product_id +``` + +### Explanation of Fields + +- **`path`** + - *Type:* string + - *Description:* File path to your raw tabular dataset (CSV or JSON). + - *Example:* `"data/sample.csv"` + - *Notes:* Must exist and be readable. + +- **`output_folder`** + - *Type:* string + - *Description:* Directory where the processed tabular output will be saved as CSV. + - *Example:* `"output/tabular"` + - *Notes:* Directory will be created if missing. + +- **`type`** + - *Type:* string (`csv` or `json`) + - *Description:* Specifies the format of the input tabular file. + - *Example:* `"csv"` + +--- + +### `preprocessing` options + +Each key under `preprocessing` is an optional step. If omitted, that step is skipped. + +#### `add_row_id` +- *Type:* boolean +- *Description:* Adds a unique row ID column named `row_id` at the beginning of the DataFrame. + +#### `categorical_encoding` +- *Type:* dict +- *Fields:* + - `columns` (list of strings): Columns to encode. + - `method` (string): Encoding method. Supported: + - `"onehot"` — creates one-hot encoded dummy variables. + - `"label"` — converts categories to integer labels. +- *Notes:* Apply after filtering and cleaning. + +#### `cleaning` +- *Type:* dict +- *Fields:* + - `lowercase` (bool): Convert string columns to lowercase. + - `remove_special_chars` (bool): (Not implemented yet — reserved for future) Remove special characters in strings. + - `trim_strings` (bool): Remove leading/trailing whitespace from string columns. + +#### `column_filtering` +- *Type:* dict +- *Fields:* + - `keep` (list of strings): Keep only these columns (drop others). If specified, overrides `drop`. + - `drop` (list of strings): Drop these columns. Ignored if `keep` is present. +- *Note:* Filtering should be done early for efficiency. + +#### `drop_duplicates` +- *Type:* boolean +- *Description:* Remove exact duplicate rows. + +#### `dtype_conversion` +- *Type:* dict +- *Fields:* key = column name, value = target data type (e.g., `int`, `float`, `category`, `str`). +- *Example:* + ```yaml + dtype_conversion: + age: int + income: float + ``` + +#### `missing_values` +- *Type:* dict +- *Fields:* + - `columns`: dict mapping column names to fill values for missing entries. + - `global_fill`: single value to fill all missing values if specified (overrides column-specific fills). Use `null` to disable. +- *Example:* + ```yaml + missing_values: + columns: + age: 30 + name: Unknown + global_fill: null + ``` + +#### `normalization` +- *Type:* dict +- *Fields:* + - `columns` (list): Columns to normalize. + - `method` (string): Normalization method: + - `"minmax"` — scale values to [0,1] range. + - `"standard"` — zero mean, unit variance scaling. + +#### `outlier_removal` +- *Type:* dict +- *Fields:* + - `columns` (list): Columns to check for outliers. + - `method` (string): Method to detect outliers: + - `"iqr"` — Interquartile Range method. + - `"zscore"` — Z-score thresholding. + - `threshold` (float): Threshold multiplier, e.g. 1.5 for IQR, or z-score limit. + +#### `remove_empty_columns` +- *Type:* boolean +- *Description:* Remove columns that contain only missing values. + +#### `rename_columns` +- *Type:* dict +- *Description:* Mapping from original column names to new names. +- *Example:* + ```yaml + rename_columns: + oldName: new_name + productID: product_id + ``` + +--- + +## 2. Images Section + +```yaml +images: + path: temp_input/images + output_folder: temp_output/images + preprocessing: + grayscale: true + normalize: true + resize: [128, 128] +``` + +- **`path`** + Folder containing input images. + +- **`output_folder`** + Folder to save processed images. + +- **`preprocessing`** + - `grayscale` (bool): Convert images to grayscale. + - `normalize` (bool): Normalize pixel values to [0,1]. + - `resize` (list of two ints): Resize images to `[width, height]`. + +--- + +## 3. Videos Section + +```yaml +videos: + path: data/videos/sample.mp4 + output_folder: output/video_frames + preprocessing: + extract_frames: 30 + resize_frames: [64, 64] +``` + +- **`path`** + Path to input video file. + +- **`output_folder`** + Folder to save extracted and processed frames. + +- **`preprocessing`** + - `extract_frames` (int): Number of frames to extract or interval. + - `resize_frames` (list of two ints): Resize extracted frames to `[width, height]`. + +--- + +## How to use the config file + +1. **Prepare your data and directory structure.** + Make sure your files and folders exist for the paths you set. + +2. **Edit the YAML file** with the preprocessing steps you want for each data type. + - To skip a step, just remove or comment it out. + - Use boolean flags to toggle on/off steps. + - Provide lists for columns and mappings for renaming. + +3. **Run the pipeline script.** + It will read your config and execute the steps in order. + +4. **Check output folders** for processed files and logs. + +5. **Review metadata JSON** for detailed information about each preprocessing step and statistics. + +--- +**FOLDER STRUCTURE** + +project_root/ +├── config.yaml # Config file +├── pipeline.py # Main pipeline script to run preprocessing +├── utils/ # Source code (tabular.py, images.py, videos.py) +├── app.py # Streamlit app +│ +├── data/ # Raw input data (as referenced in config) +│ ├── images/ # Raw images folder (for image preprocessing) +│ │ ├── img1.png +│ │ └── img2.jpg +│ ├── videos/ # Raw videos folder +│ │ └── sample.mp4 +│ └── sample.csv # Raw tabular CSV file +│ +├── output/ # Processed output data (created by the pipeline) +│ ├── images/ # Processed images saved here +│ ├── tabular/ # Processed CSVs saved here +│ └── video_frames/ # Extracted and resized video frames +│ +├── requirements.txt # Python dependencies for the project +└── CONFIG_DOCUMENTATION.md # Documentation for config file usage +└── README.md # Project Documentation diff --git a/File Upload Service/app/README.md b/File Upload Service/app/README.md new file mode 100644 index 0000000..747fb54 --- /dev/null +++ b/File Upload Service/app/README.md @@ -0,0 +1,82 @@ +# Data Preprocessing Pipeline + +## Overview + +This project is a configurable data preprocessing pipeline designed to handle multiple data types including tabular data (CSV/JSON), images, and videos. The pipeline reads raw input data, applies various preprocessing steps as specified in a YAML configuration file (`config.yaml`), and outputs the cleaned and transformed data into organized output folders. for morE info about how the config.yaml file works see `CONFIG_DOCUMENTATION.md` + +## Features + +- **Tabular Data Preprocessing:** + - Handling missing values with global or column-specific fills + - Encoding categorical variables (one-hot or label encoding) + - Normalization (Min-Max or Standard scaling) + - Outlier removal (IQR or z-score methods) + - Cleaning string columns (trimming, lowercasing) + - Column filtering (keep/drop specified columns) + - Data type conversions + - Duplicate removal + - Column renaming + - Adding unique row IDs + +- **Image Preprocessing:** + - Grayscale conversion + - Normalization + - Resizing + +- **Video Preprocessing:** + - Frame extraction at specified intervals + - Frame resizing + +## How it Works + +- Define preprocessing steps and file paths in `config.yaml`. +- Run the pipeline script (`preprocess.py`), which: + - Loads the config file + - Processes tabular, image, and video data as specified + - Saves processed outputs to designated folders + - Logs processing details and saves metadata JSON for each run + + **Folder Structure** + + project_root/ +├── config.yaml # Config file +├── preprocess.py # Main pipeline script to run preprocessing +├── utils/ # Source code (tabular.py, images.py, videos.py) +├── app.py # Streamlit app for testing +│ +├── data/ # Raw input data (as referenced in config) +│ ├── images/ # Raw images folder (for image preprocessing) +│ │ ├── img1.png +│ │ └── img2.jpg +│ ├── videos/ # Raw videos folder +│ │ └── sample.mp4 +│ └── sample.csv # Raw tabular CSV file +│ +├── output/ # Processed output data (created by the pipeline) +│ ├── images/ # Processed images saved here +│ ├── tabular/ # Processed CSVs saved here +│ └── video_frames/ # Extracted and resized video frames +│ +├── requirements.txt # Python dependencies for the project +└── CONFIG_DOCUMENTATION.md # Documentation for config file usage +└── README.md # Project Documentation + + + +## Getting Started + +1. Install dependencies: + + pip install -r requirements.txt + +2. Prepare your raw data inside the `data/` folder. + +3. Customize your preprocessing pipeline via config.yaml. + +4. Run the pipeline: + + python preprocess.py + +5. Run for testing: + + streamlit run app.py \ No newline at end of file diff --git a/File Upload Service/app/__init__.py b/File Upload Service/app/__init__.py new file mode 100644 index 0000000..db3e327 --- /dev/null +++ b/File Upload Service/app/__init__.py @@ -0,0 +1 @@ +# utils package diff --git a/File Upload Service/app/app.py b/File Upload Service/app/app.py new file mode 100644 index 0000000..50e6e26 --- /dev/null +++ b/File Upload Service/app/app.py @@ -0,0 +1,154 @@ +import streamlit as st +import yaml +import os +import subprocess +import pandas as pd +import json +import shutil +from datetime import datetime +from PIL import Image + +st.set_page_config(page_title="Redback Preprocessing Tester", layout="centered") +st.title("🧹 Redback Data Preprocessing Test App") + +# Unified uploader +uploaded_files = st.file_uploader( + "📁 Upload Tabular (CSV/JSON), Images (PNG/JPG), or Video (MP4)", + type=["csv", "json", "png", "jpg", "jpeg", "mp4"], + accept_multiple_files=True +) + +# Separate config uploader +config_file = st.file_uploader("📄 Upload YAML Config", type=["yaml", "yml"]) + +# Initialize containers +tabular_file = None +image_files = [] +video_file = None + +# Sort uploaded files +if uploaded_files: + for file in uploaded_files: + ext = file.name.lower().split('.')[-1] + if ext in ["csv", "json"] and tabular_file is None: + tabular_file = file + elif ext in ["png", "jpg", "jpeg"]: + image_files.append(file) + elif ext == "mp4" and video_file is None: + video_file = file + +# Proceed if config is uploaded +if config_file: + with st.spinner("Preparing files..."): + # Save config + config_path = "config.yaml" + with open(config_path, "wb") as f: + f.write(config_file.getvalue()) + + # Load config + with open(config_path, "r") as f: + config = yaml.safe_load(f) + + # Check required input folders exist, else error out + # Tabular file handling + if tabular_file and "tabular" in config: + tabular_path = os.path.join("temp_input", tabular_file.name) + if not os.path.isdir("temp_input"): + st.error("Input folder 'temp_input' does not exist. Please create it manually.") + st.stop() + with open(tabular_path, "wb") as f: + f.write(tabular_file.getvalue()) + config["tabular"]["path"] = tabular_path + if "output_folder" not in config["tabular"]: + st.error("Config 'tabular' section missing 'output_folder'.") + st.stop() + + # Image files handling + if image_files and "images" in config: + img_dir = "temp_input/images" + if not os.path.isdir(img_dir): + st.error(f"Input images folder '{img_dir}' does not exist. Please create it manually.") + st.stop() + for img in image_files: + with open(os.path.join(img_dir, img.name), "wb") as f: + f.write(img.getvalue()) + config["images"]["path"] = img_dir + if "output_folder" not in config["images"]: + st.error("Config 'images' section missing 'output_folder'.") + st.stop() + + # Video file handling + if video_file and "videos" in config: + video_path = os.path.join("temp_input", video_file.name) + if not os.path.isdir("temp_input"): + st.error("Input folder 'temp_input' does not exist. Please create it manually.") + st.stop() + with open(video_path, "wb") as f: + f.write(video_file.getvalue()) + config["videos"]["path"] = video_path + if "output_folder" not in config["videos"]: + st.error("Config 'videos' section missing 'output_folder'.") + st.stop() + + # Save updated config + with open(config_path, "w") as f: + yaml.dump(config, f) + + st.subheader("✅ Config Loaded") + st.json(config) + + # Run preprocessing + st.info("Running preprocessing pipeline...") + with st.spinner("Processing..."): + try: + result = subprocess.run(["python", "preprocess.py"], capture_output=True, text=True) + st.text(result.stdout) + if result.stderr: + st.error(result.stderr) + except Exception as e: + st.error(f"❌ Pipeline execution failed: {e}") + st.stop() + + # Tabular preview + tabular_out = os.path.join(config["tabular"]["output_folder"], "processed_tabular.csv") + if os.path.exists(tabular_out): + st.subheader("📈 Tabular Output Preview") + df_out = pd.read_csv(tabular_out) + st.dataframe(df_out.head()) + else: + st.warning(f"Tabular output file not found: {tabular_out}") + + # Image preview + img_out_dir = config["images"]["output_folder"] + if os.path.exists(img_out_dir): + img_files = sorted(os.listdir(img_out_dir))[:5] + if img_files: + st.subheader("🖼️ Processed Images Preview") + for img_name in img_files: + img_path = os.path.join(img_out_dir, img_name) + st.image(Image.open(img_path), caption=img_name) + else: + st.warning(f"Image output folder not found: {img_out_dir}") + + # Video frame preview + vid_out_dir = config["videos"]["output_folder"] + if os.path.exists(vid_out_dir): + frame_files = sorted(os.listdir(vid_out_dir))[:5] + if frame_files: + st.subheader("🎥 Processed Video Frames Preview") + for frame_name in frame_files: + frame_path = os.path.join(vid_out_dir, frame_name) + st.image(Image.open(frame_path), caption=frame_name) + else: + st.warning(f"Video frames output folder not found: {vid_out_dir}") + + # Metadata display + metadata_files = [f for f in os.listdir() if f.startswith("metadata_") and f.endswith(".json")] + if metadata_files: + latest_meta = sorted(metadata_files)[-1] + with open(latest_meta, "r") as f: + metadata = json.load(f) + st.subheader("🧾 Metadata Summary") + st.json(metadata) + + st.success("✅ Processing complete and preview displayed.") diff --git a/File Upload Service/app/config.yaml b/File Upload Service/app/config.yaml new file mode 100644 index 0000000..6cbc9f7 --- /dev/null +++ b/File Upload Service/app/config.yaml @@ -0,0 +1,66 @@ +tabular: + output_folder: output/tabular + path: data/sample.csv + preprocessing: + add_row_id: true + categorical_encoding: + columns: + - gender + - city + method: onehot + cleaning: + lowercase: true + remove_special_chars: false + trim_strings: true + column_filtering: + drop: [] + keep: + - age + - income + - gender + - city + drop_duplicates: true + dtype_conversion: + age: int + gender: category + income: float + missing_values: + columns: + age: 30 + name: Unknown + global_fill: null + normalization: + columns: + - age + - income + method: minmax + outlier_removal: + columns: + - age + - income + method: iqr + threshold: 1.5 + remove_empty_columns: true + rename_columns: + oldName: new_name + productID: product_id + type: csv + +images: + path: data/images + output_folder: output/images + preprocessing: + grayscale: true + normalize: true + resize: + - 128 + - 128 + +videos: + output_folder: output/video_frames + path: data/videos/sample.mp4 + preprocessing: + extract_frames: 30 + resize_frames: + - 64 + - 64 diff --git a/File Upload Service/app/images.py b/File Upload Service/app/images.py new file mode 100644 index 0000000..0113da4 --- /dev/null +++ b/File Upload Service/app/images.py @@ -0,0 +1,54 @@ +import os +import pandas as pd +from PIL import Image +import numpy as np +from datetime import datetime + +def save_processed_images(images, output_folder, save_as="png", logger=None): + valid_formats = ["png", "jpg", "jpeg", "bmp", "tiff"] + save_as = save_as.lower() + + if save_as not in valid_formats: + if logger: + logger.warning(f"Unsupported image format '{save_as}'. Defaulting to 'png'.") + save_as = "png" + + os.makedirs(output_folder, exist_ok=True) + + metadata = [] + + for i, img_array in enumerate(images): + img_uint8 = (img_array * 255).astype(np.uint8) + + # Convert to PIL image + if img_uint8.ndim == 2: + img_pil = Image.fromarray(img_uint8, mode='L') + else: + img_pil = Image.fromarray(img_uint8) + + # Save image + filename = f"processed_img_{i}.{save_as}" + save_path = os.path.join(output_folder, filename) + img_pil.save(save_path) + + # Collect metadata + metadata.append({ + "filename": filename, + "width": img_pil.width, + "height": img_pil.height, + "timestamp": datetime.now().isoformat(), + "processed_path": save_path + }) + + if logger: + logger.debug(f"Saved processed image {save_path}") + + # Save metadata CSV + metadata_df = pd.DataFrame(metadata) + metadata_csv_path = os.path.join(output_folder, "metadata.csv") + metadata_df.to_csv(metadata_csv_path, index=False) + + if logger: + logger.info(f"Saved image metadata to {metadata_csv_path}") + + return metadata \ No newline at end of file diff --git a/File Upload Service/app/preprocess.py b/File Upload Service/app/preprocess.py new file mode 100644 index 0000000..bfc651a --- /dev/null +++ b/File Upload Service/app/preprocess.py @@ -0,0 +1,136 @@ +import yaml +import pandas as pd +from utils import tabular, images, videos +import logging +from datetime import datetime +import json +import os + +def setup_logger(log_path='pipeline.log'): + logger = logging.getLogger('DataPreprocessingPipeline') + logger.setLevel(logging.DEBUG) + + if not logger.handlers: # Avoid duplicate handlers in repeated runs + ch = logging.StreamHandler() + ch.setLevel(logging.INFO) + formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') + ch.setFormatter(formatter) + logger.addHandler(ch) + + fh = logging.FileHandler(log_path) + fh.setLevel(logging.DEBUG) + fh.setFormatter(formatter) + logger.addHandler(fh) + + return logger + +def load_tabular_data(path, data_type='csv'): + if data_type == 'csv': + return pd.read_csv(path) + elif data_type == 'json': + return pd.read_json(path) + else: + raise ValueError(f"Unsupported tabular type: {data_type}") + +def main(): + logger = setup_logger() + logger.info("Starting preprocessing pipeline") + + metadata = { + 'run_id': datetime.now().strftime('%Y%m%d_%H%M%S'), + 'start_time': datetime.now().isoformat(), + 'steps': [] + } + + def log_step(step_name, info): + metadata['steps'].append({ + 'step': step_name, + 'timestamp': datetime.now().isoformat(), + 'info': info + }) + + # Load config + with open("config.yaml") as f: + config = yaml.safe_load(f) + logger.info("Loaded config.yaml") + + # TABULAR + if 'tabular' in config: + tab_cfg = config['tabular'] + logger.info(f"Loading tabular data from {tab_cfg['path']}") + df = load_tabular_data(tab_cfg['path'], tab_cfg.get('type', 'csv')) + logger.info(f"Original tabular shape: {df.shape}") + + processed_df, tab_metadata = tabular.preprocess_tabular(df, tab_cfg['preprocessing'], logger=logger) + logger.info(f"Processed tabular shape: {processed_df.shape}") + + if 'output_folder' in tab_cfg: + os.makedirs(tab_cfg['output_folder'], exist_ok=True) + save_path = os.path.join(tab_cfg['output_folder'], "processed_tabular.csv") + processed_df.to_csv(save_path, index=False) + logger.info(f"Saved processed tabular data to {save_path}") + + log_step('tabular_preprocessing', { + 'input_shape': df.shape, + 'output_shape': processed_df.shape, + 'output_folder': tab_cfg.get('output_folder'), + 'metadata': tab_metadata + }) + + # IMAGES + if 'images' in config: + img_cfg = config['images']['preprocessing'] + img_path = config['images']['path'] + output_folder = config['images'].get('output_folder') + + logger.info(f"Processing images from {img_path}") + imgs = images.preprocess_images(img_path, img_cfg, logger=logger) + logger.info(f"Processed {len(imgs)} images.") + + if output_folder: + img_metadata = images.save_processed_images(imgs, output_folder, save_as="png", logger=logger) + logger.info(f"Saved processed images and metadata to {output_folder}") + else: + img_metadata = [] + + log_step('image_preprocessing', { + 'num_images_processed': len(imgs), + 'output_folder': output_folder, + 'metadata': img_metadata + }) + + # VIDEOS + if 'videos' in config: + vid_cfg = config['videos']['preprocessing'] + vid_path = config['videos']['path'] + output_folder = config['videos'].get('output_folder') + + logger.info(f"Processing video from {vid_path}") + video_metadata_dict = {} + frames = videos.preprocess_video(vid_path, vid_cfg, logger=logger, metadata=video_metadata_dict) + logger.info(f"Processed {len(frames)} video frames.") + + if output_folder: + vid_metadata = videos.save_processed_video_frames(frames, output_folder, logger=logger) + logger.info(f"Saved video frames and metadata to {output_folder}") + else: + vid_metadata = [] + + log_step('video_preprocessing', { + 'num_frames_processed': len(frames), + 'output_folder': output_folder, + 'metadata': vid_metadata, + **video_metadata_dict + }) + + # Save metadata + metadata['end_time'] = datetime.now().isoformat() + metadata_path = f"metadata_{metadata['run_id']}.json" + with open(metadata_path, 'w') as f: + json.dump(metadata, f, indent=4) + logger.info(f"Saved pipeline metadata to {metadata_path}") + + logger.info("Pipeline finished successfully") + +if __name__ == "__main__": + main() diff --git a/File Upload Service/app/requirements.txt b/File Upload Service/app/requirements.txt index 157470f..1fddcfa 100644 --- a/File Upload Service/app/requirements.txt +++ b/File Upload Service/app/requirements.txt @@ -1,4 +1,7 @@ -streamlit==1.25.0 -minio==7.1.11 -python-dotenv==1.0.0 -pyspark==3.5.0 \ No newline at end of file +PyYAML>=6.0 +pandas>=1.3 +numpy>=1.21 +scikit-learn>=1.0 +scipy>=1.7 +Pillow>=9.0 +streamlit>=1.48 diff --git a/File Upload Service/app/tabular.py b/File Upload Service/app/tabular.py new file mode 100644 index 0000000..80b33bf --- /dev/null +++ b/File Upload Service/app/tabular.py @@ -0,0 +1,303 @@ +import pandas as pd +import numpy as np +from sklearn.preprocessing import MinMaxScaler, StandardScaler, LabelEncoder +from scipy.stats import zscore +import os + +# ------------------------------ +# Missing Values +# ------------------------------ +def handle_missing_values(df, missing_cfg, logger=None): + meta = {'missing_values_filled': 0} + if logger: + logger.info("Handling missing values") + + before_na = df.isna().sum().sum() + + if 'global_fill' in missing_cfg and missing_cfg['global_fill'] is not None: + df = df.fillna(missing_cfg['global_fill']) + if logger: + logger.debug(f"Filled all missing values with: {missing_cfg['global_fill']}") + + if 'columns' in missing_cfg: + for col, val in missing_cfg['columns'].items(): + if col in df.columns: + df[col] = df[col].fillna(val) + if logger: + logger.debug(f"Filled missing values in column '{col}' with: {val}") + + after_na = df.isna().sum().sum() + meta['missing_values_filled'] = before_na - after_na + return df, meta + +# ------------------------------ +# Normalization +# ------------------------------ +def normalize_columns(df, norm_cfg, logger=None): + meta = {} + method = norm_cfg.get('method', 'minmax') + cols = norm_cfg.get('columns', []) + + if not cols: + if logger: + logger.info("No columns specified for normalization; skipping") + return df, meta + + if logger: + logger.info(f"Normalizing columns {cols} using method '{method}'") + + if method == 'minmax': + scaler = MinMaxScaler() + elif method == 'standard': + scaler = StandardScaler() + else: + if logger: + logger.warning(f"Unknown normalization method '{method}'; skipping normalization") + return df, meta + + df[cols] = scaler.fit_transform(df[cols]) + meta['normalized_columns'] = cols + meta['normalization_method'] = method + return df, meta + +# ------------------------------ +# Encoding +# ------------------------------ +def encode_categorical(df, encode_cfg, logger=None): + meta = {} + method = encode_cfg.get('method', 'onehot') + cols = encode_cfg.get('columns', []) + + if logger: + logger.info(f"Encoding categorical columns {cols} using method '{method}'") + + if method == 'label': + for col in cols: + if col in df.columns: + le = LabelEncoder() + df[col] = le.fit_transform(df[col].astype(str)) + if logger: + logger.debug(f"Label encoded column '{col}'") + meta['encoded_columns'] = cols + meta['encoding_method'] = 'label' + + elif method == 'onehot': + existing_cols = [c for c in cols if c in df.columns] + df = pd.get_dummies(df, columns=existing_cols) + meta['encoded_columns'] = existing_cols + meta['encoding_method'] = 'onehot' + if logger: + logger.debug(f"One-hot encoded columns {existing_cols}") + + else: + if logger: + logger.warning(f"Unknown encoding method '{method}'; skipping encoding") + + return df, meta + +# ------------------------------ +# Outlier Removal +# ------------------------------ +def remove_outliers(df, outlier_cfg, logger=None): + meta = {'outliers_removed': 0} + method = outlier_cfg.get('method', 'iqr') + cols = outlier_cfg.get('columns', []) + threshold = outlier_cfg.get('threshold', 1.5) + + if logger: + logger.info(f"Removing outliers using method '{method}' on columns {cols} with threshold {threshold}") + + before_rows = df.shape[0] + + if method == 'iqr': + for col in cols: + if col in df.columns: + Q1 = df[col].quantile(0.25) + Q3 = df[col].quantile(0.75) + IQR = Q3 - Q1 + lower_bound = Q1 - threshold * IQR + upper_bound = Q3 + threshold * IQR + df = df[(df[col] >= lower_bound) & (df[col] <= upper_bound)] + + elif method == 'zscore': + for col in cols: + if col in df.columns: + z_scores = np.abs(zscore(df[col])) + df = df[z_scores < threshold] + + else: + if logger: + logger.warning(f"Unknown outlier removal method '{method}'; skipping") + + after_rows = df.shape[0] + meta['outliers_removed'] = before_rows - after_rows + return df, meta + +# ------------------------------ +# Cleaning +# ------------------------------ +def clean_data(df, cleaning_cfg, logger=None): + meta = {} + if logger: + logger.info("Cleaning data") + + if cleaning_cfg.get('trim_strings', False): + df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x) + meta['trim_strings'] = True + if logger: + logger.debug("Trimmed strings in dataframe") + + if cleaning_cfg.get('lowercase', False): + df = df.applymap(lambda x: x.lower() if isinstance(x, str) else x) + meta['lowercase'] = True + if logger: + logger.debug("Lowercased strings in dataframe") + + return df, meta + +# ------------------------------ +# Column Filtering +# ------------------------------ +def filter_columns(df, filter_cfg, logger=None): + meta = {} + if 'keep' in filter_cfg and filter_cfg['keep']: + df = df.loc[:, filter_cfg['keep']] + meta['columns_kept'] = filter_cfg['keep'] + if logger: + logger.info(f"Filtered columns, keeping only: {filter_cfg['keep']}") + elif 'drop' in filter_cfg and filter_cfg['drop']: + df = df.drop(columns=filter_cfg['drop'], errors='ignore') + meta['columns_dropped'] = filter_cfg['drop'] + if logger: + logger.info(f"Dropped columns: {filter_cfg['drop']}") + return df, meta + +# ------------------------------ +# Dtype Conversion +# ------------------------------ +def convert_dtypes(df, dtype_cfg, logger=None): + meta = {'dtype_conversions': {}} + for col, dtype in dtype_cfg.items(): + if col in df.columns: + try: + df[col] = df[col].astype(dtype) + meta['dtype_conversions'][col] = dtype + if logger: + logger.debug(f"Converted column '{col}' to dtype '{dtype}'") + except Exception as e: + warn_msg = f"Warning: could not convert column {col} to {dtype}: {e}" + if logger: + logger.warning(warn_msg) + else: + print(warn_msg) + return df, meta + +# ------------------------------ +# Remove Empty Columns +# ------------------------------ +def remove_empty_columns(df, logger=None): + empty_cols = [col for col in df.columns if df[col].isna().all()] + if empty_cols: + df = df.drop(columns=empty_cols) + if logger: + logger.info(f"Removed empty columns: {empty_cols}") + return df, {'empty_columns_removed': empty_cols} + +# ------------------------------ +# Add Row IDs +# ------------------------------ +def add_row_ids(df, logger=None): + df.insert(0, 'row_id', range(1, len(df) + 1)) + if logger: + logger.info("Added unique row_id column") + return df, {'row_id_added': True} + +# ------------------------------ +# Drop Duplicates +# ------------------------------ +def drop_duplicates(df, drop_cfg=True, logger=None): + meta = {'duplicates_removed': 0} + if drop_cfg: + before_rows = df.shape[0] + df = df.drop_duplicates() + after_rows = df.shape[0] + meta['duplicates_removed'] = before_rows - after_rows + if logger: + logger.info(f"Removed {meta['duplicates_removed']} duplicate rows") + return df, meta + +# ------------------------------ +# Rename Columns +# ------------------------------ +def rename_columns(df, rename_cfg, logger=None): + meta = {} + if rename_cfg: + df = df.rename(columns=rename_cfg) + meta['columns_renamed'] = rename_cfg + if logger: + logger.info(f"Renamed columns: {rename_cfg}") + return df, meta + +# ------------------------------ +# Main Preprocess +# ------------------------------ +def preprocess_tabular(df, cfg, logger=None): + metadata = [] + + def record_step(name, meta): + metadata.append({"step": name, **meta}) + + if cfg.get('remove_empty_columns', False): + df, meta = remove_empty_columns(df, logger=logger) + record_step('remove_empty_columns', meta) + + if cfg.get('add_row_id', False): + df, meta = add_row_ids(df, logger=logger) + record_step('add_row_id', meta) + + if cfg.get('drop_duplicates', False): + df, meta = drop_duplicates(df, logger=logger) + record_step('drop_duplicates', meta) + + if 'rename_columns' in cfg: + df, meta = rename_columns(df, cfg['rename_columns'], logger=logger) + record_step('rename_columns', meta) + + if 'missing_values' in cfg: + df, meta = handle_missing_values(df, cfg['missing_values'], logger=logger) + record_step('handle_missing_values', meta) + + if 'normalization' in cfg: + df, meta = normalize_columns(df, cfg['normalization'], logger=logger) + record_step('normalize_columns', meta) + + if 'categorical_encoding' in cfg: + df, meta = encode_categorical(df, cfg['categorical_encoding'], logger=logger) + record_step('encode_categorical', meta) + + if 'outlier_removal' in cfg: + df, meta = remove_outliers(df, cfg['outlier_removal'], logger=logger) + record_step('remove_outliers', meta) + + if 'cleaning' in cfg: + df, meta = clean_data(df, cfg['cleaning'], logger=logger) + record_step('clean_data', meta) + + if 'column_filtering' in cfg: + df, meta = filter_columns(df, cfg['column_filtering'], logger=logger) + record_step('filter_columns', meta) + + if 'dtype_conversion' in cfg: + df, meta = convert_dtypes(df, cfg['dtype_conversion'], logger=logger) + record_step('convert_dtypes', meta) + + return df, metadata + +# ------------------------------ +# Save Tabular Output +# ------------------------------ +def save_processed_tabular(df, output_folder, logger=None): + os.makedirs(os.path.dirname(output_folder), exist_ok=True) + df.to_csv(output_folder, index=False) + if logger: + logger.info(f"Saved processed tabular data to {output_folder}") diff --git a/File Upload Service/app/videos.py b/File Upload Service/app/videos.py new file mode 100644 index 0000000..cde5fc3 --- /dev/null +++ b/File Upload Service/app/videos.py @@ -0,0 +1,47 @@ +import os +import pandas as pd +from PIL import Image +import numpy as np +from datetime import datetime + +def save_processed_video_frames(frames, output_folder, logger=None): + os.makedirs(output_folder, exist_ok=True) + + metadata = [] + saved_count = 0 + for idx, frame in enumerate(frames): + # Convert float32 RGB array [0,1] back to uint8 [0,255] + frame_uint8 = (frame * 255).astype(np.uint8) + img = Image.fromarray(frame_uint8) + + filename = f'frame_{idx:04d}.png' + out_path = os.path.join(output_folder, filename) + try: + img.save(out_path) + saved_count += 1 + if logger: + logger.debug(f"Saved frame {idx} to {out_path}") + # Collect metadata + metadata.append({ + "filename": filename, + "width": img.width, + "height": img.height, + "timestamp": datetime.now().isoformat(), + "processed_path": out_path + }) + except Exception as e: + if logger: + logger.error(f"Error saving frame {idx}: {e}") + + if logger: + logger.info(f"Saved {saved_count} frames to folder: {output_folder}") + + # Save metadata CSV + metadata_df = pd.DataFrame(metadata) + metadata_csv_path = os.path.join(output_folder, "metadata.csv") + metadata_df.to_csv(metadata_csv_path, index=False) + + if logger: + logger.info(f"Saved video frames metadata to {metadata_csv_path}") + + return metadata diff --git a/README.md b/README.md index bf8ee23..582c0c9 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,23 @@ # redback-data-warehouse Data Warehouse storage of code and configurations + +## Garmin Run Data – ETL Pipeline Update + +This ETL pipeline processes `Garmin_run_data.csv` and includes: + +### Data cleaning: +- Removes duplicate rows +- Standardizes column names (lowercase, underscores) +- Converts timestamps to datetime +- Fills missing numeric values with column means +- Removes outliers in `heart_rate` (keeps values between 30–220 bpm) +- Converts distance from meters to kilometers +- Converts speed from m/s to km/h + +### Data aggregation: +- Groups data by year and week +- Calculates total runs, total distance (km), average speed (km/h), and average pace (min/km) per week + +### Outputs: +- `cleaned_garmin_run_data.csv` → cleaned dataset + diff --git a/Requirement Gathering (4).pdf b/Requirement Gathering (4).pdf new file mode 100644 index 0000000..aaa2831 Binary files /dev/null and b/Requirement Gathering (4).pdf differ diff --git a/etl_scripts/ETL pipeline.ipynb b/etl_scripts/ETL pipeline.ipynb new file mode 100644 index 0000000..cae4104 --- /dev/null +++ b/etl_scripts/ETL pipeline.ipynb @@ -0,0 +1,239 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "27156c5d-4dfb-4f95-a4e6-d88976d9c7c5", + "metadata": {}, + "source": [ + "# Importing required libraries\r\n", + "# We use pandas for data manipulation and matplotlib for visualization" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "ced94ba9-4fef-457e-9c36-855cc56c90bb", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n" + ] + }, + { + "cell_type": "markdown", + "id": "0501979d-4574-4efc-8409-43de021a8bb6", + "metadata": {}, + "source": [ + "# Extract – Load the Garmin running data\r\n", + "# Reading the original raw CSV file" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "5445f384-8e1a-4d1a-9936-49f6ae3cbacb", + "metadata": {}, + "outputs": [], + "source": [ + "df_raw = pd.read_csv(\"Garmin_run_data.csv\")" + ] + }, + { + "cell_type": "markdown", + "id": "6221e8e8-af73-4fbe-be56-d992d7f63132", + "metadata": {}, + "source": [ + "# Transform – Cleaning and enhancing the data" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "368a5cac-e8e5-444b-8dbe-9d96a7581f3d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ Cleaned data and weekly stats saved.\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "# ============================\n", + "# 📥 Load raw data\n", + "# ============================\n", + "df_raw = pd.read_csv(\"Garmin_run_data.csv\")\n", + "\n", + "# ============================\n", + "# 🧹 Data Cleaning\n", + "# ============================\n", + "\n", + "# 1. Remove duplicate rows\n", + "df_cleaned = df_raw.drop_duplicates()\n", + "\n", + "# 2. Standardize column names (lowercase, underscores)\n", + "df_cleaned.columns = [col.strip().lower().replace(\" \", \"_\") for col in df_cleaned.columns]\n", + "\n", + "# 3. Convert timestamps to datetime\n", + "if 'timestamp' in df_cleaned.columns:\n", + " df_cleaned['timestamp'] = pd.to_datetime(df_cleaned['timestamp'], errors='coerce')\n", + "\n", + "# 4. Fill missing numeric values with column means\n", + "numeric_cols = df_cleaned.select_dtypes(include='number').columns\n", + "df_cleaned[numeric_cols] = df_cleaned[numeric_cols].fillna(df_cleaned[numeric_cols].mean())\n", + "\n", + "# 5. Remove outliers in heart_rate (keep values between 30 and 220 bpm)\n", + "if 'heart_rate' in df_cleaned.columns:\n", + " df_cleaned = df_cleaned[(df_cleaned['heart_rate'] >= 30) & (df_cleaned['heart_rate'] <= 220)]\n", + "\n", + "# 6. Unit conversion: meters to kilometers\n", + "if 'distance' in df_cleaned.columns:\n", + " df_cleaned['distance_km'] = df_cleaned['distance'] / 1000\n", + "\n", + "# 7. Unit conversion: speed from m/s to km/h\n", + "if 'speed' in df_cleaned.columns:\n", + " df_cleaned['speed_kmh'] = df_cleaned['speed'] * 3.6\n", + "\n", + "# ============================\n", + "# 📊 Data Aggregation (Weekly Stats)\n", + "# ============================\n", + "\n", + "if 'timestamp' in df_cleaned.columns:\n", + " # Extract week, month, year for grouping\n", + " df_cleaned['week'] = df_cleaned['timestamp'].dt.isocalendar().week\n", + " df_cleaned['month'] = df_cleaned['timestamp'].dt.month\n", + " df_cleaned['year'] = df_cleaned['timestamp'].dt.year\n", + "\n", + " # Group by year + week to compute stats\n", + " weekly_stats = df_cleaned.groupby(['year', 'week']).agg(\n", + " total_runs=('timestamp', 'count'),\n", + " total_distance_km=('distance_km', 'sum'),\n", + " average_speed_kmh=('speed_kmh', 'mean')\n", + " ).reset_index()\n", + "\n", + " # Calculate average pace (min/km) if speed exists\n", + " if 'average_speed_kmh' in weekly_stats.columns:\n", + " weekly_stats['average_pace_min_per_km'] = 60 / weekly_stats['average_speed_kmh']\n", + "\n", + "# ============================\n", + "# 💾 Save outputs\n", + "# ============================\n", + "\n", + "# Save cleaned data\n", + "df_cleaned.to_csv(\"cleaned_garmin_run_data.csv\", index=False)\n", + "\n", + "# Save weekly statistics (if generated)\n", + "if 'weekly_stats' in locals():\n", + " weekly_stats.to_csv(\"weekly_stats_garmin_run_data.csv\", index=False)\n", + "\n", + "print(\"✅ Cleaned data and weekly stats saved.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "1eb98d8e-3551-4b80-962a-810059c40992", + "metadata": {}, + "source": [ + "# Visualize – Ploting distributions for insight" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "6425c501-52f4-44eb-b1a3-46c5ecec29dc", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAArcAAAGHCAYAAACqD3pHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABG+UlEQVR4nO3dd3wVdb7/8fdJTzCgBNMwCZEiVaSoCyjVBCkqYMFlI6AIrAiC4KLoVcKuhnZFVlCBlQW8CFYse5ESBAENKCCEFhGUJhBiAAmQEFK+vz/45VwOKSSHc1KG1/PxOI/dmfnOfD7zyTB+MpkzYzPGGAEAAAAW4FHRCQAAAACuQnMLAAAAy6C5BQAAgGXQ3AIAAMAyaG4BAABgGTS3AAAAsAyaWwAAAFgGzS0AAAAsg+YWAAAAlkFzCwAAAMuguQVQKvPnz5fNZrN//Pz8FBoaqk6dOmnixIlKS0srtE58fLxsNluZ4mRmZio+Pl7ffPONizKveJfWzdPTUzfccIOaN2+uoUOHauPGjYXGHzhwQDabTfPnzy9TnEWLFmn69OllWqeoWAU/t/T09DJtqyS7d+9WfHy8Dhw4UGjZwIEDVadOHZfFKqtffvlFvr6+2rBhg31ex44d1bRpU7fHfuyxx9SrVy+3xwGuJTS3AMpk3rx52rBhgxITE/XWW2/ptttu0+TJk9WoUSOtWrXKYeyTTz7p0DCURmZmpiZMmGCp5laSHnroIW3YsEHffvutPvjgA/Xv318bN25UmzZtNHLkSIexYWFh2rBhg3r06FGmGM40t87GKqvdu3drwoQJRTa3L7/8sj777DO3xi/Jc889p5iYGLVp06bcY8fHx2vp0qVavXp1uccGrMqrohMAULU0bdpUrVu3tk8/+OCDevbZZ3XXXXepT58+2rt3r0JCQiRJN910k2666aaKSrVSCQkJ0Z/+9Cf7dNeuXTVq1CgNGTJEb775pho2bKinnnpKkuTr6+sw1h3y8vKUm5tbLrGupG7duhUWOyUlRZ9//rmWL19eIfHr1q2re++9V5MmTVLnzp0rJAfAarhyC+CqRUZG6vXXX9eZM2c0e/Zs+/yibktYvXq1OnbsqKCgIPn7+ysyMlIPPvigMjMzdeDAAd14442SpAkTJtj/lD9w4EBJ0r59+/T444+rfv36CggIUO3atXXfffdpx44dDjG++eYb2Ww2LV68WC+99JLCw8NVvXp13XPPPdqzZ0+h/JcvX64uXbqoRo0aCggIUKNGjTRx4kSHMZs3b9b999+vmjVrys/PTy1atNBHH310VXXz9PTUzJkzVatWLU2dOtU+v6hbBX7//XcNGTJEERER8vX11Y033qh27drZr5Z37NhRS5cu1cGDBx1ug7h0e1OmTNGrr76q6Oho+fr6as2aNSXeAnH48GH16dNH1atXV40aNRQXF6fff//dYYzNZlN8fHyhdevUqWP/uc2fP18PP/ywJKlTp0723ApiFnVbwvnz5zVu3DhFR0fLx8dHtWvX1tNPP60//vijUJyePXtq+fLlatmypfz9/dWwYUP9+9//vkL1L3rnnXcUGhqqmJiYK4797LPPFBAQoCeffFK5ubn2/R8+fLjmzZunW265Rf7+/mrdurU2btwoY4ymTp2q6OhoXXfddercubP27dtXaLuPPfaYVq1apV9++aVUOQMoGc0tAJfo3r27PD09tW7dumLHHDhwQD169JCPj4/+/e9/a/ny5Zo0aZKqVaumCxcuKCwszH4FbdCgQdqwYYM2bNigl19+WZJ09OhRBQUFadKkSVq+fLneeusteXl56c477yyyaX3xxRd18OBBvfvuu5ozZ4727t2r++67T3l5efYxc+fOVffu3ZWfn69Zs2bpP//5j5555hn99ttv9jFr1qxRu3bt9Mcff2jWrFn64osvdNttt6lv375lvi/2cv7+/rrnnnu0f/9+h5iXe+yxx/T555/rlVde0cqVK/Xuu+/qnnvu0YkTJyRJb7/9ttq1a6fQ0FB73S6/JeTNN9/U6tWr9d///d9atmyZGjZsWGJuvXv3Vr169fTJJ58oPj5en3/+ubp27aqcnJwy7WOPHj2UkJAgSXrrrbfsuRV3K4QxRr169dJ///d/67HHHtPSpUs1evRoLViwQJ07d1Z2drbD+OTkZI0ZM0bPPvusvvjiC916660aNGhQicdigaVLl6p9+/by8Cj5P4dvvPGGHn74Yb344ot699135eX1f3/4/N///V+9++67mjRpkhYvXqwzZ86oR48eGjNmjL777jvNnDlTc+bM0e7du/Xggw/KGOOw7Y4dO8oYo6+++uqK+QIoBQMApTBv3jwjyWzatKnYMSEhIaZRo0b26fHjx5tLTzOffPKJkWS2bdtW7DZ+//13I8mMHz/+ijnl5uaaCxcumPr165tnn33WPn/NmjVGkunevbvD+I8++shIMhs2bDDGGHPmzBlTvXp1c9ddd5n8/Pxi4zRs2NC0aNHC5OTkOMzv2bOnCQsLM3l5eSXmKck8/fTTxS5//vnnjSTz/fffG2OM2b9/v5Fk5s2bZx9z3XXXmVGjRpUYp0ePHiYqKqrQ/ILt1a1b11y4cKHIZZfGKvi5XVpTY4x5//33jSSzcOFCh30r6mcVFRVlBgwYYJ/++OOPjSSzZs2aQmMHDBjgkPfy5cuNJDNlyhSHcR9++KGRZObMmeMQx8/Pzxw8eNA+Lysry9SsWdMMHTq0UKxLHT9+3EgykyZNKrSsQ4cOpkmTJiYvL88MHz7c+Pj4OOx3AUkmNDTUnD171j7v888/N5LMbbfd5nBcTZ8+3Ugy27dvL7Sd2rVrm759+5aYL4DS4cotAJcxl12Rutxtt90mHx8fDRkyRAsWLNCvv/5apu3n5uYqISFBjRs3lo+Pj7y8vOTj46O9e/cqJSWl0Pj777/fYfrWW2+VJB08eFCSlJSUpIyMDA0bNqzYpzrs27dPP/30k/7yl7/Ycyj4dO/eXceOHSvyqnFZXKluknTHHXdo/vz5evXVV7Vx48YyXz2VLtbD29u71OML9rnAI488Ii8vL61Zs6bMscui4MtVBbc1FHj44YdVrVo1ff311w7zb7vtNkVGRtqn/fz81KBBA/vPuThHjx6VJAUHBxe5/Pz58+rVq5fef/99rVy5slA9CnTq1EnVqlWzTzdq1EiS1K1bN4fjqmB+UXkFBwfryJEjJeYLoHRobgG4xLlz53TixAmFh4cXO6Zu3bpatWqVgoOD9fTTT6tu3bqqW7eu/vnPf5YqxujRo/Xyyy+rV69e+s9//qPvv/9emzZtUvPmzZWVlVVofFBQkMO0r6+vJNnHFtw/WtKX3o4fPy7p4jfqvb29HT7Dhg2TpKt+ZFZBs1NS7T788EMNGDBA7777rtq0aaOaNWuqf//+Sk1NLXWcsLCwMuUVGhrqMO3l5aWgoCD7rRDucuLECXl5ednvvy5gs9kUGhpaKP7lP2fp4s+6qGPiUgXL/fz8ilyelpamFStWqE2bNmrbtm2x26lZs6bDtI+PT4nzz58/X2gbfn5+V8wXQOnwtAQALrF06VLl5eWpY8eOJY67++67dffddysvL0+bN2/WjBkzNGrUKIWEhOjRRx8tcd2FCxeqf//+9vs3C6Snp+v6668vc84FzVNJ97rWqlVLkjRu3Dj16dOnyDG33HJLmWMXyMrK0qpVq1S3bt0Sm+xatWpp+vTpmj59ug4dOqQvv/xSL7zwgtLS0kr9Tf+yPnM4NTVVtWvXtk/n5ubqxIkTDs2kr69voXtgJV1VAxwUFKTc3Fz9/vvvDg2uMUapqam6/fbbnd72pQp+tidPnixyeWRkpKZNm6bevXurT58++vjjj4tthK/WyZMnK/RZv4CVcOUWwFU7dOiQnnvuOdWoUUNDhw4t1Tqenp6688479dZbb0mSfvzxR0mFr65eymaz2ZcXWLp0qdN/zm3btq1q1KihWbNmFXtrwC233KL69esrOTlZrVu3LvITGBjoVPy8vDwNHz5cJ06c0PPPP1/q9SIjIzV8+HDFxMTY6yaV7mplWbz//vsO0x999JFyc3MdfoGpU6eOtm/f7jBu9erVOnv2rMO8kn6ul+vSpYuki7/MXOrTTz/VuXPn7MuvVlRUlPz9/Ut8SkFsbKxWrFihdevWqWfPnjp37pxLYl8qNzdXhw8fVuPGjV2+beBaxJVbAGWyc+dO+z2naWlpWr9+vebNmydPT0999tlnhf6UfKlZs2Zp9erV6tGjhyIjI3X+/Hn7I5vuueceSVJgYKCioqL0xRdfqEuXLqpZs6Zq1aplf+TT/Pnz1bBhQ916663asmWLpk6d6vSzdK+77jq9/vrrevLJJ3XPPfdo8ODBCgkJ0b59+5ScnKyZM2dKkmbPnq1u3bqpa9euGjhwoGrXrq2TJ08qJSVFP/74oz7++OMrxjp+/Lj98VBnzpzRzp079d577yk5OVnPPvusBg8eXOy6p0+fVqdOndSvXz81bNhQgYGB2rRpk5YvX+5wNblZs2ZasmSJ3nnnHbVq1UoeHh4OzyQuqyVLlsjLy0sxMTHatWuXXn75ZTVv3lyPPPKIfcxjjz2ml19+Wa+88oo6dOig3bt3a+bMmapRo4bDtgre9jVnzhwFBgbKz89P0dHRRd5SEBMTo65du+r5559XRkaG2rVrp+3bt2v8+PFq0aKFHnvsMaf36VI+Pj5q06ZNkW+Ju9Rdd92lr7/+Wvfee69iY2P11VdfFdq/q7F9+3ZlZmaqU6dOLtsmcE2r0K+zAagyCp6WUPDx8fExwcHBpkOHDiYhIcGkpaUVWufypyVs2LDB9O7d20RFRRlfX18TFBRkOnToYL788kuH9VatWmVatGhhfH19jST7t+5PnTplBg0aZIKDg01AQIC56667zPr1602HDh1Mhw4d7OsXPC3h448/dthuUU8GMMaYr776ynTo0MFUq1bNBAQEmMaNG5vJkyc7jElOTjaPPPKICQ4ONt7e3iY0NNR07tzZzJo164q1u7RuHh4epnr16qZZs2ZmyJAh9ic3lJTn+fPnzV//+ldz6623murVqxt/f39zyy23mPHjx5tz587Z1zt58qR56KGHzPXXX29sNpu99gXbmzp16hVjGfN/P7ctW7aY++67z1x33XUmMDDQ/PnPfzbHjx93WD87O9uMHTvWREREGH9/f9OhQwezbdu2Qk9LMObi0wKio6ONp6enQ8zLn5ZgzMUnHjz//PMmKirKeHt7m7CwMPPUU0+ZU6dOOYyLiooyPXr0KLRflx8TxZk7d67x9PQ0R48eLbR+kyZNHObt3LnThIaGmpYtW5rff//dGFP0kzCKq3dxx+XLL79satWqZc6fP3/FfAFcmc2YUnxNFwAACzp//rwiIyM1ZsyYMt0a4ip5eXmqV6+e+vXrp9dee63c4wNWxD23AIBrlp+fnyZMmKBp06a55X7aK1m4cKHOnj2rv/3tb+UeG7Aq7rkFAFzThgwZoj/++EO//vqrmjVrVq6x8/Pz9f777zv1tA8AReO2BAAAAFgGtyUAAADAMmhuAQAAYBncc6uL9zwdPXpUgYGBZX6DDwAAANzP/P/nhIeHh8vDo/jrszS3ko4ePaqIiIiKTgMAAABXcPjw4RJf3lOhze26des0depUbdmyRceOHdNnn32mXr162ZcbYzRhwgTNmTNHp06dsr+qs0mTJvYx2dnZeu6557R48WJlZWWpS5cuevvtt8v0xqKCV2cePnxY1atXd9n+FScnJ0crV65UbGysvL293R7PKqib86idc6ibc6ibc6ib86idc6pa3TIyMhQREXHFV55XaHN77tw5NW/eXI8//rgefPDBQsunTJmiadOmaf78+WrQoIFeffVVxcTEaM+ePfYdGzVqlP7zn//ogw8+UFBQkMaMGaOePXtqy5Yt8vT0LFUeBbciVK9evdya24CAAFWvXr1KHEyVBXVzHrVzDnVzDnVzDnVzHrVzTlWt25VuIa3Q5rZbt27q1q1bkcuMMZo+fbpeeukl+7vTFyxYoJCQEC1atEhDhw7V6dOnNXfuXP3P//yP/b30CxcuVEREhFatWqWuXbuW274AAACg4lXae27379+v1NRUxcbG2uf5+vqqQ4cOSkpK0tChQ7Vlyxbl5OQ4jAkPD1fTpk2VlJRUbHObnZ2t7Oxs+3RGRoaki7/B5OTkuGmP/k9BjPKIZSXUzXnUzjnUzTnUzTnUzXnUzjlVrW6lzbPSNrepqamSpJCQEIf5ISEhOnjwoH2Mj4+PbrjhhkJjCtYvysSJEzVhwoRC81euXKmAgICrTb3UEhMTyy2WlVA351E751A351A351A351E751SVumVmZpZqXKVtbgtcfl+FMeaK91pcacy4ceM0evRo+3TBDcqxsbHlds9tYmKiYmJiqtQ9LhWNujmP2jmHujmHujmHujmP2jmnqtWt4C/tV1Jpm9vQ0FBJF6/OhoWF2eenpaXZr+aGhobqwoULOnXqlMPV27S0NLVt27bYbfv6+srX17fQfG9v73L94ZZ3PKugbs6jds6hbs6hbs6hbs6jds6pKnUrbY6V9g1l0dHRCg0NdbhUfuHCBa1du9beuLZq1Ure3t4OY44dO6adO3eW2NwCAADAmir0yu3Zs2e1b98++/T+/fu1bds21axZU5GRkRo1apQSEhJUv3591a9fXwkJCQoICFC/fv0kSTVq1NCgQYM0ZswYBQUFqWbNmnruuefUrFkz+9MTAAAAcO2o0OZ28+bN6tSpk3264D7YAQMGaP78+Ro7dqyysrI0bNgw+0scVq5c6fDw3jfeeENeXl565JFH7C9xmD9/fqmfcQsAAADrqNDmtmPHjjLGFLvcZrMpPj5e8fHxxY7x8/PTjBkzNGPGDDdkCAAAgKqk0t5zCwAAAJRVpX1aAqq2Q4cOKT093aXbzM/PlyQlJyfLw6Pw72W1atVSZGSkS2MCAICqheYWLnfo0CE1bNRQWZlZLt2uv7+/Fi9erPbt2ysrq/C2/QP89VPKTzS4AABcw2hu4XLp6enKysxS3Ow4hTQIufIKpeRpPKUMacTSEcqz5TksO/7zcS0culDp6ek0twAAXMNobuE2IQ1CFNE8wmXbs+XapCSpdtPaMl7FfxERAABcu/hCGQAAACyD5hYAAACWQXMLAAAAy6C5BQAAgGXQ3AIAAMAyaG4BAABgGTS3AAAAsAyaWwAAAFgGzS0AAAAsg+YWAAAAlkFzCwAAAMuguQUAAIBl0NwCAADAMmhuAQAAYBk0twAAALAMmlsAAABYBs0tAAAALIPmFgAAAJZBcwsAAADLoLkFAACAZdDcAgAAwDJobgEAAGAZNLcAAACwDJpbAAAAWAbNLQAAACyD5hYAAACWQXMLAAAAy6C5BQAAgGXQ3AIAAMAyaG4BAABgGTS3AAAAsAyaWwAAAFgGzS0AAAAsg+YWAAAAlkFzCwAAAMuguQUAAIBl0NwCAADAMmhuAQAAYBk0twAAALAMmlsAAABYBs0tAAAALIPmFgAAAJZBcwsAAADLoLkFAACAZdDcAgAAwDIqdXObm5ur//qv/1J0dLT8/f1188036+9//7vy8/PtY4wxio+PV3h4uPz9/dWxY0ft2rWrArMGAABARanUze3kyZM1a9YszZw5UykpKZoyZYqmTp2qGTNm2MdMmTJF06ZN08yZM7Vp0yaFhoYqJiZGZ86cqcDMAQAAUBEqdXO7YcMGPfDAA+rRo4fq1Kmjhx56SLGxsdq8ebOki1dtp0+frpdeekl9+vRR06ZNtWDBAmVmZmrRokUVnD0AAADKm1dFJ1CSu+66S7NmzdLPP/+sBg0aKDk5Wd9++62mT58uSdq/f79SU1MVGxtrX8fX11cdOnRQUlKShg4dWuR2s7OzlZ2dbZ/OyMiQJOXk5CgnJ8d9O/T/FcQoj1gVIT8/X/7+/vI0nrLl2ly23YJtFbVNT+Mpf39/5efnW7auV8Pqx5y7UDfnUDfnUDfnUTvnVLW6lTZPmzHGuDkXpxlj9OKLL2ry5Mny9PRUXl6eXnvtNY0bN06SlJSUpHbt2unIkSMKDw+3rzdkyBAdPHhQK1asKHK78fHxmjBhQqH5ixYtUkBAgHt2BgAAAE7LzMxUv379dPr0aVWvXr3YcZX6yu2HH36ohQsXatGiRWrSpIm2bdumUaNGKTw8XAMGDLCPs9kcr+QZYwrNu9S4ceM0evRo+3RGRoYiIiIUGxtbYrFcJScnR4mJiYqJiZG3t7fb45W35ORktW/fXiOWjlDtprVdtl1brk2RP0Tq0B2HZLwcfyc7svOIZvSYoXXr1ql58+Yui2kVVj/m3IW6OYe6OYe6OY/aOaeq1a3gL+1XUqmb27/97W964YUX9Oijj0qSmjVrpoMHD2rixIkaMGCAQkNDJUmpqakKCwuzr5eWlqaQkJBit+vr6ytfX99C8729vcv1h1ve8cqLh4eHsrKylGfLK9SEuoLxMoW2m2fLU1ZWljw8PCxZU1ex6jHnbtTNOdTNOdTNedTOOVWlbqXNsVJ/oSwzM1MeHo4penp62h8FFh0drdDQUCUmJtqXX7hwQWvXrlXbtm3LNVcAAABUvEp95fa+++7Ta6+9psjISDVp0kRbt27VtGnT9MQTT0i6eDvCqFGjlJCQoPr166t+/fpKSEhQQECA+vXrV8HZAwAAoLxV6uZ2xowZevnllzVs2DClpaUpPDxcQ4cO1SuvvGIfM3bsWGVlZWnYsGE6deqU7rzzTq1cuVKBgYEVmDkAAAAqQqVubgMDAzV9+nT7o7+KYrPZFB8fr/j4+HLLCwAAAJVTpb7nFgAAACiLSn3lFqjMDh06pPT09HKLV6tWLUVGRpZbPAAAqiKaW8AJhw4dUsNGDZWVmVVuMf0D/PVTyk80uAAAlIDmFnBCenq6sjKzFDc7TiENin+msqsc//m4Fg5dqPT0dJpbAABKQHMLXIWQBiGKaB5R0WkAAID/jy+UAQAAwDJobgEAAGAZNLcAAACwDJpbAAAAWAbNLQAAACyDpyVUoOTkZHl4lM/vF7wAAAAAXAtobivAb7/9Jklq3769srLK5yUAvAAAAABcC2huK8CJEyckSX3/2VdB9YPcHo8XAAAAgGsFzW0FCq4XrPDm4RWdBgAAgGXwhTIAAABYBs0tAAAALIPmFgAAAJZBcwsAAADLoLkFAACAZdDcAgAAwDJobgEAAGAZNLcAAACwDJpbAAAAWAbNLQAAACyD5hYAAACWQXMLAAAAy6C5BQAAgGXQ3AIAAMAyaG4BAABgGTS3AAAAsAyaWwAAAFgGzS0AAAAsg+YWAAAAlkFzCwAAAMuguQUAAIBl0NwCAADAMmhuAQAAYBk0twAAALAMmlsAAABYBs0tAAAALMOrohNA+UlJSbFUHAAAgMvR3F4DMo5nyOZhU1xcXEWnAgAA4FY0t9eArNNZMvlGcbPjFNIgxO3xdq/arWWvLXN7HAAAgMvR3F5DQhqEKKJ5hNvjHP/5uNtjAAAAFIUvlAEAAMAyaG4BAABgGTS3AAAAsAyaWwAAAFiGU83t/v37XZ0HAAAAcNWcam7r1aunTp06aeHChTp//ryrc3Jw5MgRxcXFKSgoSAEBAbrtttu0ZcsW+3JjjOLj4xUeHi5/f3917NhRu3btcmtOAAAAqJycam6Tk5PVokULjRkzRqGhoRo6dKh++OEHV+emU6dOqV27dvL29tayZcu0e/duvf7667r++uvtY6ZMmaJp06Zp5syZ2rRpk0JDQxUTE6MzZ864PB8AAABUbk41t02bNtW0adN05MgRzZs3T6mpqbrrrrvUpEkTTZs2Tb///rtLkps8ebIiIiI0b9483XHHHapTp466dOmiunXrSrp41Xb69Ol66aWX1KdPHzVt2lQLFixQZmamFi1a5JIcAAAAUHVc1UscvLy81Lt3b3Xv3l1vv/22xo0bp+eee07jxo1T3759NXnyZIWFhTm9/S+//FJdu3bVww8/rLVr16p27doaNmyYBg8eLOnivb+pqamKjY21r+Pr66sOHTooKSlJQ4cOLXK72dnZys7Otk9nZGRIknJycpSTk+N0vqWVn58vSfI0nrLl2twez8vmJX9//yofr2BbRW3T03jK399f+fn55fYzLM+aFuxfSkqK/fgpi4J1tm7dKg+P0v1OGxQUpJtuuqnMsayk4Fgqj2PKCn777TedOHHCqePNGVY7RjnenEftnFPV6lbaPG3GGONskM2bN+vf//63PvjgA1WrVk0DBgzQoEGDdPToUb3yyis6c+bMVd2u4OfnJ0kaPXq0Hn74Yf3www8aNWqUZs+erf79+yspKUnt2rXTkSNHFB4ebl9vyJAhOnjwoFasWFHkduPj4zVhwoRC8xctWqSAgACn8wUAAIB7ZGZmql+/fjp9+rSqV69e7Dinmttp06Zp3rx52rNnj7p3764nn3xS3bt3d/gNfd++fWrYsKFyc3Od2wNJPj4+at26tZKSkuzznnnmGW3atEkbNmywN7dHjx51uEI8ePBgHT58WMuXLy9yu0VduY2IiFB6enqJxXKVrVu36tixY/o+8HuFNgt1f7zPt+rDkR9qxNIRqt20dpWNZ8u1KfKHSB2645CMl+Nhe2TnEc3oMUPr1q1T8+bNXRazOMnJyWrfvn2517TvP/squF5wmdf3NJ6688yd+j7we+XZ8q44Pm1fmj4c+WG51bOyysnJUWJiomJiYuTt7V3R6VRqBf8m+v6zr8LqhpXpeHOGFY9RjjfnUTvnVLW6ZWRkqFatWldsbp26LeGdd97RE088occff1yhoUU3Z5GRkZo7d64zm7cLCwtT48aNHeY1atRIn376qSTZY6empjo0t2lpaQoJCSl2u76+vvL19S0039vbu1x+uAW/BOTZ8go1ae6Qa3KVlZVlmXjGyxTabp4tT1lZWfLw8Ci3n2FF1DSofpDCm4dfeYXL2HJtUpIU2iy0VPmWdz0ru/I6N1RlBf8mguoHKbRJaJmON2dY+RjleHMetXNOValbaXN0qrndu3fvFcf4+PhowIABzmzerl27dtqzZ4/DvJ9//llRUVGSpOjoaIWGhioxMVEtWrSQJF24cEFr167V5MmTryo2AAAAqh6n7vSfN2+ePv7440LzP/74Yy1YsOCqkyrw7LPPauPGjUpISNC+ffu0aNEizZkzR08//bQkyWazadSoUUpISNBnn32mnTt3auDAgQoICFC/fv1clgcAAACqBqeu3E6aNEmzZs0qND84OFhDhgy56iu2BW6//XZ99tlnGjdunP7+978rOjpa06dP11/+8hf7mLFjxyorK0vDhg3TqVOndOedd2rlypUKDAx0SQ6oWlJSUiwVBwAAlI1Tze3BgwcVHR1daH5UVJQOHTp01UldqmfPnurZs2exy202m+Lj4xUfH+/SuKhaMo5nyOZhU1xcXEWnAgAAKpBTzW1wcLC2b9+uOnXqOMxPTk5WUFCQK/ICyiTrdJZMvlHc7DiFNCj+y4SusnvVbi17bZnb4wAAgLJxqrl99NFH9cwzzygwMFDt27eXJK1du1YjR47Uo48+6tIEgbIIaRCiiOYRbo9z/Ofjbo8BAADKzqnm9tVXX9XBgwfVpUsXeXld3ER+fr769++vhIQElyYIAAAAlJZTza2Pj48+/PBD/eMf/1BycrL8/f3VrFkz+yO6AAAAgIrgVHNboEGDBmrQoIGrcgEAAACuilPNbV5enubPn6+vv/5aaWlpys/Pd1i+evVqlyQHAAAAlIVTze3IkSM1f/589ejRQ02bNpXNZnN1XgAAAECZOdXcfvDBB/roo4/UvXt3V+cDAAAAOM2p1+/6+PioXr16rs4FAAAAuCpONbdjxozRP//5TxljXJ0PAAAA4DSnbkv49ttvtWbNGi1btkxNmjSRt7e3w/IlS5a4JDkAAACgLJxqbq+//nr17t3b1bkAAAAAV8Wp5nbevHmuzgMAAAC4ak7dcytJubm5WrVqlWbPnq0zZ85Iko4ePaqzZ8+6LDkAAACgLJy6cnvw4EHde++9OnTokLKzsxUTE6PAwEBNmTJF58+f16xZs1ydJwAAAHBFTl25HTlypFq3bq1Tp07J39/fPr937976+uuvXZYcAAAAUBZOPy3hu+++k4+Pj8P8qKgoHTlyxCWJAQAAAGXl1JXb/Px85eXlFZr/22+/KTAw8KqTAgAAAJzhVHMbExOj6dOn26dtNpvOnj2r8ePH80peAAAAVBinbkt444031KlTJzVu3Fjnz59Xv379tHfvXtWqVUuLFy92dY4AAABAqTjV3IaHh2vbtm1avHixfvzxR+Xn52vQoEH6y1/+4vAFMwAAAKA8OdXcSpK/v7+eeOIJPfHEE67MBwAAAHCaU83te++9V+Ly/v37O5UMAAAAcDWcam5HjhzpMJ2Tk6PMzEz5+PgoICCA5hYAAAAVwqmnJZw6dcrhc/bsWe3Zs0d33XUXXygDAABAhXGquS1K/fr1NWnSpEJXdQEAAIDy4rLmVpI8PT119OhRV24SAAAAKDWn7rn98ssvHaaNMTp27Jhmzpypdu3auSQxAAAAoKycam579erlMG2z2XTjjTeqc+fOev31112RFwAAAFBmTjW3+fn5rs4DAAAAuGouvecWAAAAqEhOXbkdPXp0qcdOmzbNmRAAAABAmTnV3G7dulU//vijcnNzdcstt0iSfv75Z3l6eqply5b2cTabzTVZAgAAAKXgVHN73333KTAwUAsWLNANN9wg6eKLHR5//HHdfffdGjNmjEuTBAAAAErDqXtuX3/9dU2cONHe2ErSDTfcoFdffZWnJQAAAKDCONXcZmRk6Pjx44Xmp6Wl6cyZM1edFAAAAOAMp5rb3r176/HHH9cnn3yi3377Tb/99ps++eQTDRo0SH369HF1jgAAAECpOHXP7axZs/Tcc88pLi5OOTk5Fzfk5aVBgwZp6tSpLk0QAAAAKC2nmtuAgAC9/fbbmjp1qn755RcZY1SvXj1Vq1bN1fkBAAAApXZVL3E4duyYjh07pgYNGqhatWoyxrgqLwAAAKDMnGpuT5w4oS5duqhBgwbq3r27jh07Jkl68skneQwYAAAAKoxTze2zzz4rb29vHTp0SAEBAfb5ffv21fLly12WHAAAAFAWTt1zu3LlSq1YsUI33XSTw/z69evr4MGDLkkMAAAAKCunrtyeO3fO4YptgfT0dPn6+l51UgAAAIAznGpu27dvr/fee88+bbPZlJ+fr6lTp6pTp04uSw4AAAAoC6duS5g6dao6duyozZs368KFCxo7dqx27dqlkydP6rvvvnN1jgAAAECpOHXltnHjxtq+fbvuuOMOxcTE6Ny5c+rTp4+2bt2qunXrujpHAAAAoFTKfOU2JydHsbGxmj17tiZMmOCOnAAAAACnlPnKrbe3t3bu3CmbzeaOfAAAAACnOXVbQv/+/TV37lxX5wIAAABcFae+UHbhwgW9++67SkxMVOvWrVWtWjWH5dOmTXNJcpebOHGiXnzxRY0cOVLTp0+XJBljNGHCBM2ZM0enTp3SnXfeqbfeektNmjRxSw7AtSQlJaXcYtWqVUuRkZHlFg8AYE1lam5//fVX1alTRzt37lTLli0lST///LPDGHfdrrBp0ybNmTNHt956q8P8KVOmaNq0aZo/f74aNGigV199VTExMdqzZ48CAwPdkgtgdRnHM2TzsCkuLq7cYvoH+OunlJ9ocAEAV6VMzW39+vV17NgxrVmzRtLF1+2++eabCgkJcUtyBc6ePau//OUv+te//qVXX33VPt8Yo+nTp+ull15Snz59JEkLFixQSEiIFi1apKFDh7o1L8Cqsk5nyeQbxc2OU0gD9/77lqTjPx/XwqELlZ6eTnMLALgqZWpujTEO08uWLdO5c+dcmlBRnn76afXo0UP33HOPQ3O7f/9+paamKjY21j7P19dXHTp0UFJSUrHNbXZ2trKzs+3TGRkZki4+CSInJ8dNe/F/8vPzJUmexlO2XPd/Mc/L5iV/f/8qH69gW0Vt0yr76K54JdWupHjh9cNVu0ntMscrK0/jKX9/f+Xn55fLv8HSKsilMuVUWeXn5xc6Rt35b6OyHjNXg+PNedTOOVWtbqXN02Yu71hL4OHhodTUVAUHB0uSAgMDlZycrJtvvtm5LEvhgw8+0GuvvaZNmzbJz89PHTt21G233abp06crKSlJ7dq105EjRxQeHm5fZ8iQITp48KBWrFhR5Dbj4+OLfIzZokWLinytMAAAACpWZmam+vXrp9OnT6t69erFjivTlVubzVbonlp3PhLs8OHDGjlypFauXCk/P78S87qUMabEvMaNG6fRo0fbpzMyMhQREaHY2NgSi+UqW7du1bFjx/R94PcKbRbq/nifb9WHIz/UiKUjVLup+6/CuSueLdemyB8ideiOQzJejr+TWWUf3RWvpNq5I15ZHdl5RDN6zNC6devUvHlzt8crrZycHCUmJiomJkbe3t4VnU6llpycrPbt22vE0hG6qeFNZTrenFFZj5mrwfHmPGrnnKpWt4K/tF9JmW9LGDhwoHx9fSVJ58+f11//+tdCT0tYsmRJWTZbrC1btigtLU2tWrWyz8vLy9O6des0c+ZM7dmzR5KUmpqqsLAw+5i0tLQS7wP29fW178OlvL29y+WH6+Fx8QlsebY8t534L5VrcpWVlWWZeMbLFNqu1fbRXfGKqp0745VWni1PWVlZ8vDwqJQn2PI6N1RlHh4ehY6Z0h5vzqjsx8zV4HhzHrVzTlWpW2lzLFNzO2DAAIdpd3+TukuXLtqxY4fDvMcff1wNGzbU888/r5tvvlmhoaFKTExUixYtJF18TNnatWs1efJkt+YGAACAyqdMze28efPclUeRAgMD1bRpU4d51apVU1BQkH3+qFGjlJCQoPr166t+/fpKSEhQQECA+vXrV665AgAAoOI59RKHymTs2LHKysrSsGHD7C9xWLlyJc+4BQAAuAZVueb2m2++cZi22WyKj49XfHx8heQDAACAysOjohMAAAAAXIXmFgAAAJZBcwsAAADLoLkFAACAZdDcAgAAwDJobgEAAGAZNLcAAACwDJpbAAAAWAbNLQAAACyD5hYAAACWQXMLAAAAy6C5BQAAgGXQ3AIAAMAyvCo6AQAokJKSUq7xatWqpcjIyHKNCQBwL5pbABUu43iGbB42xcXFlWtc/wB//ZTyEw0uAFgIzS2ACpd1Oksm3yhudpxCGoSUS8zjPx/XwqELlZ6eTnMLABZCcwug0ghpEKKI5hEVnQYAoArjC2UAAACwDJpbAAAAWAbNLQAAACyD5hYAAACWQXMLAAAAy+BpCQCuaSW9OCI/P1+SlJycLA+Pq78WwEsjAMD9aG4BXJNK8+IIf39/LV68WO3bt1dWVtZVx+SlEQDgfjS3AK5JpXlxhKfxlDKkEUtHKM+Wd1XxeGkEAJQPmlsA17SSXhxhy7VJSVLtprVlvEw5ZwYAcAZfKAMAAIBl0NwCAADAMmhuAQAAYBk0twAAALAMmlsAAABYBs0tAAAALIPmFgAAAJZBcwsAAADLoLkFAACAZdDcAgAAwDJobgEAAGAZNLcAAACwDJpbAAAAWAbNLQAAACyD5hYAAACWQXMLAAAAy6C5BQAAgGV4VXQCAHAtSUlJKbdYtWrVUmRkZLnFA4DKgOYWAMpBxvEM2TxsiouLK7eY/gH++inlJxpcANcUmlsAKAdZp7Nk8o3iZscppEGI2+Md//m4Fg5dqPT0dJpbANcUmlsAKEchDUIU0TyiotMAAMviC2UAAACwDJpbAAAAWEalbm4nTpyo22+/XYGBgQoODlavXr20Z88ehzHGGMXHxys8PFz+/v7q2LGjdu3aVUEZAwAAoCJV6uZ27dq1evrpp7Vx40YlJiYqNzdXsbGxOnfunH3MlClTNG3aNM2cOVObNm1SaGioYmJidObMmQrMHAAAABWhUn+hbPny5Q7T8+bNU3BwsLZs2aL27dvLGKPp06frpZdeUp8+fSRJCxYsUEhIiBYtWqShQ4dWRNoAAACoIJW6ub3c6dOnJUk1a9aUJO3fv1+pqamKjY21j/H19VWHDh2UlJRUbHObnZ2t7Oxs+3RGRoYkKScnRzk5Oe5K3y4/P1+S5Gk8Zcu1uT2el81L/v7+VT5ewbaK2qZV9tFd8UqqnTvilVV5xyttzLLW7WrjuZKn8ZS/v7/y8/PL7bx2+f65cz8L9i8lJcV+TnW3oKAg3XTTTW7bfsHPqTx+XlZD7ZxT1epW2jxtxhjj5lxcwhijBx54QKdOndL69eslSUlJSWrXrp2OHDmi8PBw+9ghQ4bo4MGDWrFiRZHbio+P14QJEwrNX7RokQICAtyzAwAAAHBaZmam+vXrp9OnT6t69erFjqsyV26HDx+u7du369tvvy20zGZzvDpgjCk071Ljxo3T6NGj7dMZGRmKiIhQbGxsicVyla1bt+rYsWP6PvB7hTYLdX+8z7fqw5EfasTSEardtHaVjWfLtSnyh0gduuOQjJfj72RW2Ud3xSupdu6IV1blHa+0Mctat6uN50pHdh7RjB4ztG7dOjVv3tzt8ZKTk9W+fXuNWDpCNzW8yWV1K05BPfv+s6+C6wW7Jcal0val6cORH7q1njk5OUpMTFRMTIy8vb3dEsOqqJ1zqlrdCv7SfiVVorkdMWKEvvzyS61bt87hT0KhoRcbw9TUVIWFhdnnp6WlKSSk+DcA+fr6ytfXt9B8b2/vcvnhenhc/B5fni3PbSf+S+WaXGVlZVkmnvEyhbZrtX10V7yiaufOeKVV3vHKGrO0dXNVPFfIs+UpKytLHh4e5XZeu3z/XFG34hTUM6h+kMKbh195hatUnvUsr/8WWRG1c05VqVtpc6zUT0swxmj48OFasmSJVq9erejoaIfl0dHRCg0NVWJion3ehQsXtHbtWrVt27a80wUAAEAFq9RXbp9++mktWrRIX3zxhQIDA5WamipJqlGjhvz9/WWz2TRq1CglJCSofv36ql+/vhISEhQQEKB+/fpVcPYAAAAob5W6uX3nnXckSR07dnSYP2/ePA0cOFCSNHbsWGVlZWnYsGE6deqU7rzzTq1cuVKBgYHlnC0AAAAqWqVubkvzIAebzab4+HjFx8e7PyEAAABUapX6nlsAAACgLGhuAQAAYBk0twAAALAMmlsAAABYBs0tAAAALIPmFgAAAJZBcwsAAADLoLkFAACAZdDcAgAAwDJobgEAAGAZlfr1uwCAq5OSkmKpOABwJTS3AGBBGcczZPOwKS4urqJTAYByRXMLABaUdTpLJt8obnacQhqEuD3e7lW7tey1ZW6PAwBXQnMLABYW0iBEEc0j3B7n+M/H3R4DAEqDL5QBAADAMmhuAQAAYBk0twAAALAMmlsAAABYBs0tAAAALIPmFgAAAJZBcwsAAADLoLkFAACAZdDcAgAAwDJobgEAAGAZNLcAAACwDJpbAAAAWAbNLQAAACyD5hYAAACWQXMLAAAAy6C5BQAAgGXQ3AIAAMAyaG4BAABgGV4VnQAAAFVFSkqK27adn58vSUpOTpaHx8VrT7Vq1VJkZKTbYsK9Dh06pPT09HKNyTFDcwsAwBVlHM+QzcOmuLg4t8Xw9/fX4sWL1b59e2VlZV2cF+Cvn1J+uuablaro0KFDatioobIys8o1LscMzS0AAFeUdTpLJt8obnacQhqEuCWGp/GUMqQRS0coz5an4z8f18KhC5Wenn5NNypVVXp6urIys9x6zFyOY+YimlsAAEoppEGIIppHuGXbtlyblCTVblpbxsu4Jca1rjxvEyi4hcWdxwyKRnMLAAAsr6JuE0D5o7kFAACWV963CexetVvLXlvm9jgojOYWAABUCFfdJlDUkyYuV963CRz/+bjbYxSntE/1KE3drqQyPp2B5hYAAJQ7V94mUNSTJq5FZX2qhyvqVhmfzkBzCwAAyp0rbxO4/EkTRbkWbhMo61M9SlO3klTWpzPQ3AIAUIm588URl8vOzpavr2+5xHLlbQKledJERd4mUN5KW1OrPqGD5hYAgEqoPF4ccTmbh00m3zpNDq5NNLcAAFRC5fHiiEsV/NmepwmgqqO5BQCgEivvb/dfC08TgLU599wHAAAAoBKiuQUAAIBl0NwCAADAMmhuAQAAYBk0twAAALAMyzS3b7/9tqKjo+Xn56dWrVpp/fr1FZ0SAAAAypklmtsPP/xQo0aN0ksvvaStW7fq7rvvVrdu3XTo0KGKTg0AAADlyBLN7bRp0zRo0CA9+eSTatSokaZPn66IiAi98847FZ0aAAAAylGVf4nDhQsXtGXLFr3wwgsO82NjY5WUlFTkOtnZ2crOzrZPnz59WpJ08uRJ5eTkuC/Z/y8jI0OZmZlKPZSq7HPZV17hKp345YT8/Px0bPsx5Z7NrbLxPI2nbsy8UQc3HlSeLa9cYhanqsUrqXbuiFdW5R2vtDHLWrerjedKFRnPnDEuq1tp4lmlnpcfb1bcR3fFK82/1aq8f+6KebXnuN9//V1+fn7KyMjQiRMnnEm5TM6cOSNJMuYKr4g2VdyRI0eMJPPdd985zH/ttddMgwYNilxn/PjxRhIfPnz48OHDhw+fKvY5fPhwib1hlb9yW8BmszlMG2MKzSswbtw4jR492j6dn5+vkydPKigoqNh1XCkjI0MRERE6fPiwqlev7vZ4VkHdnEftnEPdnEPdnEPdnEftnFPV6maM0ZkzZxQeHl7iuCrf3NaqVUuenp5KTU11mJ+WlqaQkJAi1/H19ZWvr6/DvOuvv95dKRarevXqVeJgqmyom/OonXOom3Oom3Oom/OonXOqUt1q1KhxxTFV/gtlPj4+atWqlRITEx3mJyYmqm3bthWUFQAAACpClb9yK0mjR4/WY489ptatW6tNmzaaM2eODh06pL/+9a8VnRoAAADKkSWa2759++rEiRP6+9//rmPHjqlp06b66quvFBUVVdGpFcnX11fjx48vdGsESkbdnEftnEPdnEPdnEPdnEftnGPVutmMudLzFAAAAICqocrfcwsAAAAUoLkFAACAZdDcAgAAwDJobgEAAGAZNLel8Pbbbys6Olp+fn5q1aqV1q9fX+L4tWvXqlWrVvLz89PNN9+sWbNmFRrz6aefqnHjxvL19VXjxo312WeflTmuMUbx8fEKDw+Xv7+/OnbsqF27dl3dzrpQRdRt4sSJuv322xUYGKjg4GD16tVLe/bscRgzcOBA2Ww2h8+f/vSnq99hF6mIusXHxxeqSWhoqMMYjrfCdatTp06hutlsNj399NP2MZX9eJNcX7tdu3bpwQcftNdn+vTpTsW91o650tSNc5xzdeMc51zdquw5rsSX88J88MEHxtvb2/zrX/8yu3fvNiNHjjTVqlUzBw8eLHL8r7/+agICAszIkSPN7t27zb/+9S/j7e1tPvnkE/uYpKQk4+npaRISEkxKSopJSEgwXl5eZuPGjWWKO2nSJBMYGGg+/fRTs2PHDtO3b18TFhZmMjIy3FeQUqqounXt2tXMmzfP7Ny502zbts306NHDREZGmrNnz9rHDBgwwNx7773m2LFj9s+JEyfcV4wyqKi6jR8/3jRp0sShJmlpaQ6xON4K1y0tLc2hZomJiUaSWbNmjX1MZT7ejHFP7X744Qfz3HPPmcWLF5vQ0FDzxhtvOBX3WjvmSlM3znHO1Y1znHN1q6rnOJrbK7jjjjvMX//6V4d5DRs2NC+88EKR48eOHWsaNmzoMG/o0KHmT3/6k336kUceMffee6/DmK5du5pHH3201HHz8/NNaGiomTRpkn35+fPnTY0aNcysWbPKsIfuUVF1u1xaWpqRZNauXWufN2DAAPPAAw+UdlfKVUXVbfz48aZ58+bF5sXxdtGVjreRI0eaunXrmvz8fPu8yny8GeOe2l0qKiqqyP9oco5zrm6X4xznqLi6cY5zzfFWVc5x3JZQggsXLmjLli2KjY11mB8bG6ukpKQi19mwYUOh8V27dtXmzZuVk5NT4piCbZYm7v79+5WamuowxtfXVx06dCg2t/JSUXUryunTpyVJNWvWdJj/zTffKDg4WA0aNNDgwYOVlpZWup1zo4qu2969exUeHq7o6Gg9+uij+vXXX+3LON7+b0xx27xw4YIWLlyoJ554QjabzWFZZTzeJPfVzhVxr8Vjzhmc40qPc9zVHW9V6RxHc1uC9PR05eXlKSQkxGF+SEiIUlNTi1wnNTW1yPG5ublKT08vcUzBNksTt+B/y5Jbeamoul3OGKPRo0frrrvuUtOmTe3zu3Xrpvfff1+rV6/W66+/rk2bNqlz587Kzs4u8766UkXW7c4779R7772nFStW6F//+pdSU1PVtm1bnThxwr6NgvVKm1t5qSzH2+eff64//vhDAwcOdJhfWY83yX21c0Xca/GYKyvOcaWvG+e4qz/eqtI5zhKv33W3y39DMcYUmnel8ZfPL802XTWmolRU3QoMHz5c27dv17fffuswv2/fvvb/37RpU7Vu3VpRUVFaunSp+vTpU8IelY+KqFu3bt3s/79Zs2Zq06aN6tatqwULFmj06NFO51aeKvp4mzt3rrp166bw8HCH+ZX9eJPcUztXxb3Wjrmy4BxX+rpxjrv6460qneO4cluCWrVqydPTs9BvRmlpaYV+IyoQGhpa5HgvLy8FBQWVOKZgm6WJW/Atz7LkVl4qqm6XGjFihL788kutWbNGN910U4n5hoWFKSoqSnv37r3ivrlTZahbgWrVqqlZs2b2mnC8lbzNgwcPatWqVXryySevmG9lOd4k99XOFXGvxWOuLDjHOVe3ApzjyqaqneNobkvg4+OjVq1aKTEx0WF+YmKi2rZtW+Q6bdq0KTR+5cqVat26tby9vUscU7DN0sSNjo5WaGiow5gLFy5o7dq1xeZWXiqqbtLF30yHDx+uJUuWaPXq1YqOjr5ividOnNDhw4cVFhZWqv1zl4qs2+Wys7OVkpJirwnH2/+NKWqb8+bNU3BwsHr06HHFfCvL8Sa5r3auiHstHnOlwTnOubpdjnNc2VS5c1x5fGutKit4/MbcuXPN7t27zahRo0y1atXMgQMHjDHGvPDCC+axxx6zjy94/Mazzz5rdu/ebebOnVvo8Rvfffed8fT0NJMmTTIpKSlm0qRJxT4KrLi4xlx8bEmNGjXMkiVLzI4dO8yf//znSvfYkvKu21NPPWVq1KhhvvnmG4fHkmRmZhpjjDlz5owZM2aMSUpKMvv37zdr1qwxbdq0MbVr176m6zZmzBjzzTffmF9//dVs3LjR9OzZ0wQGBnK8XaFuxhiTl5dnIiMjzfPPP18or8p+vBnjntplZ2ebrVu3mq1bt5qwsDDz3HPPma1bt5q9e/eWOq4x194xV5q6cY5zrm6c45yrmzFV8xxHc1sKb731lomKijI+Pj6mZcuWhR650qFDB4fx33zzjWnRooXx8fExderUMe+8806hbX788cfmlltuMd7e3qZhw4bm008/LVNcYy4+umT8+PEmNDTU+Pr6mvbt25sdO3a4ZqddoCLqJqnIz7x584wxxmRmZprY2Fhz4403Gm9vbxMZGWkGDBhgDh065PL9d1ZF1K3geY7e3t4mPDzc9OnTx+zatcthDMdb0f9OV6xYYSSZPXv2FFpWFY43Y1xfu/379xf57/Dy7XCOK3vdOMc5VzfOcc7/O62K5zibMf//DmMAAACgiuOeWwAAAFgGzS0AAAAsg+YWAAAAlkFzCwAAAMuguQUAAIBl0NwCAADAMmhuAQAAYBk0twAAALAMmlsAcAGbzabPP/+8otNwiwsXLqhevXr67rvvJEkHDhyQzWbTtm3bXBpn5syZuv/++126TQDXHppbACjGwIEDZbPZZLPZ5O3trZCQEMXExOjf//638vPzHcYeO3ZM3bp1K9V2q1ojPGfOHEVFRaldu3ZujTN48GBt2rRJ3377rVvjALA2mlsAKMG9996rY8eO6cCBA1q2bJk6deqkkSNHqmfPnsrNzbWPCw0Nla+vbwVm6j4zZszQk08+6fY4vr6+6tevn2bMmOH2WACsi+YWAErg6+ur0NBQ1a5dWy1bttSLL76oL774QsuWLdP8+fPt4y69GnvhwgUNHz5cYWFh8vPzU506dTRx4kRJUp06dSRJvXv3ls1ms0//8ssveuCBBxQSEqLrrrtOt99+u1atWuWQS506dZSQkKAnnnhCgYGBioyM1Jw5cxzG/Pbbb3r00UdVs2ZNVatWTa1bt9b3339vX/6f//xHrVq1kp+fn26++WZNmDDBoUm/3I8//qh9+/apR48exY7Jz8/X4MGD1aBBAx08eNBej9mzZ6tnz54KCAhQo0aNtGHDBu3bt08dO3ZUtWrV1KZNG/3yyy8O27r//vv1+eefKysrq9h4AFASmlsAKKPOnTurefPmWrJkSZHL33zzTX355Zf66KOPtGfPHi1cuNDexG7atEmSNG/ePB07dsw+ffbsWXXv3l2rVq3S1q1b1bVrV9133306dOiQw7Zff/11tW7dWlu3btWwYcP01FNP6aeffrJvo0OHDjp69Ki+/PJLJScna+zYsfZbKFasWKG4uDg988wz2r17t2bPnq358+frtddeK3Zf161bpwYNGqh69epFLr9w4YIeeeQRbd68Wd9++62ioqLsy/7xj3+of//+2rZtmxo2bKh+/fpp6NChGjdunDZv3ixJGj58uMP2WrdurZycHP3www/F5gQAJTIAgCINGDDAPPDAA0Uu69u3r2nUqJF9WpL57LPPjDHGjBgxwnTu3Nnk5+cXue6lY0vSuHFjM2PGDPt0VFSUiYuLs0/n5+eb4OBg88477xhjjJk9e7YJDAw0J06cKHJ7d999t0lISHCY9z//8z8mLCys2BxGjhxpOnfu7DBv//79RpJZv369ueeee0y7du3MH3/8UWgf/+u//ss+vWHDBiPJzJ071z5v8eLFxs/Pr1DMG264wcyfP7/YnACgJFy5BQAnGGNks9mKXDZw4EBt27ZNt9xyi5555hmtXLnyits7d+6cxo4dq8aNG+v666/Xddddp59++qnQldtbb73V/v9tNptCQ0OVlpYmSdq2bZtatGihmjVrFhljy5Yt+vvf/67rrrvO/hk8eLCOHTumzMzMItfJysqSn59fkcv+/Oc/6+zZs1q5cqVq1KhRaPmluYaEhEiSmjVr5jDv/PnzysjIcFjP39+/2HwA4EpobgHACSkpKYqOji5yWcuWLbV//3794x//UFZWlh555BE99NBDJW7vb3/7mz799FO99tprWr9+vbZt26ZmzZrpwoULDuO8vb0dpm02m/22A39//xJj5Ofna8KECdq2bZv9s2PHDu3du7fYBrZWrVo6depUkcu6d++u7du3a+PGjUUuvzTXgl8Eipp3+ZMnTp48qRtvvLHEfQGA4nhVdAIAUNWsXr1aO3bs0LPPPlvsmOrVq6tv377q27evHnroId177706efKkatasKW9vb+Xl5TmMX79+vQYOHKjevXtLunj/7IEDB8qU16233qp3333XHudyLVu21J49e1SvXr1Sb7NFixZ65513irxS/dRTT6lp06a6//77tXTpUnXo0KFM+Rbll19+0fnz59WiRYur3haAaxPNLQCUIDs7W6mpqcrLy9Px48e1fPlyTZw4UT179lT//v2LXOeNN95QWFiYbrvtNnl4eOjjjz9WaGiorr/+ekkXn3rw9ddfq127dvL19dUNN9ygevXqacmSJbrvvvtks9n08ssvF7qieSV//vOflZCQoF69emnixIkKCwvT1q1bFR4erjZt2uiVV15Rz549FRERoYcfflgeHh7avn27duzYoVdffbXIbXbq1Ennzp3Trl271LRp00LLR4wYoby8PPXs2VPLli3TXXfdVaacL7d+/XrdfPPNqlu37lVtB8C1i9sSAKAEy5cvV1hYmOrUqaN7771Xa9as0ZtvvqkvvvhCnp6eRa5z3XXXafLkyWrdurVuv/12HThwQF999ZU8PC6ecl9//XUlJiYqIiLCfoXyjTfe0A033KC2bdvqvvvuU9euXdWyZcsy5erj46OVK1cqODhY3bt3V7NmzTRp0iR7nl27dtX//u//KjExUbfffrv+9Kc/adq0aQ5POLhcUFCQ+vTpo/fff7/YMaNGjdKECRPUvXt3JSUllSnnyy1evFiDBw++qm0AuLbZjDGmopMAAFReO3bs0D333KN9+/YpMDDQbXF27typLl266Oeffy7yC2oAUBpcuQUAlKhZs2aaMmVKme8BLqujR4/qvffeo7EFcFW4cgsAAADL4MotAAAALIPmFgAAAJZBcwsAAADLoLkFAACAZdDcAgAAwDJobgEAAGAZNLcAAACwDJpbAAAAWAbNLQAAACzj/wH0xOAO+yOp2QAAAABJRU5ErkJggg==", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Plot heart rate distribution\n", + "if 'heart_rate' in df_cleaned.columns:\n", + " plt.figure(figsize=(8, 4))\n", + " plt.hist(df_cleaned['heart_rate'], bins=30, color='skyblue', edgecolor='black')\n", + " plt.title(\"Heart Rate Distribution\")\n", + " plt.xlabel(\"Heart Rate (bpm)\")\n", + " plt.ylabel(\"Frequency\")\n", + " plt.grid(True)\n", + " plt.show()\n", + "\n", + "# Plot distance (km) if available\n", + "if 'distance_km' in df_cleaned.columns:\n", + " plt.figure(figsize=(8, 4))\n", + " plt.hist(df_cleaned['distance_km'], bins=20, color='lightgreen', edgecolor='black')\n", + " plt.title(\"Distance Distribution (km)\")\n", + " plt.xlabel(\"Distance (km)\")\n", + " plt.ylabel(\"Frequency\")\n", + " plt.grid(True)\n", + " plt.show()\n" + ] + }, + { + "cell_type": "markdown", + "id": "f458eeb5-f7a3-4ed7-a260-0dc2c3bfb0d7", + "metadata": {}, + "source": [ + "# Load – Saved the cleaned dataset to a new CSV" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "4581ef6c-925f-4cc9-87f5-f31aaaaac393", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ Cleaned data saved to 'cleaned_garmin_run_data.csv'\n" + ] + } + ], + "source": [ + "df_cleaned.to_csv(\"cleaned_garmin_run_data.csv\", index=False)\n", + "print(\"✅ Cleaned data saved to 'cleaned_garmin_run_data.csv'\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "98056b45-85a5-4848-9014-5854c922c2af", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}