diff --git a/notebooks/Imputation_best_practices/Imputation_best_practices.ipynb b/notebooks/Imputation_best_practices/Imputation_best_practices.ipynb
new file mode 100644
index 0000000..0ecc6f1
--- /dev/null
+++ b/notebooks/Imputation_best_practices/Imputation_best_practices.ipynb
@@ -0,0 +1,4475 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "fce74c70-b998-437d-bd77-43d723d57f13",
+   "metadata": {},
+   "source": [
+    "# Handling Missing Data\n",
+    "One of the first steps in any data science workflow is to understand the dataset and to clean it. This is because real world datasets are often very messy and require significant preprocessing before they can be used for subsequent data science tasks such as feature engineering, model training, etc. One of the tasks within data cleaning is to handle with missing data. There are several approaches that can be taken for missing data, such as dropping it, filling with 0's, filling with mean, KNN imputation, etc. In this notebook, we will explore 2 of these imputation techniques, and compare their effectiveness on two sample datasets.\n",
+    "\n",
+    "a. The first sample dataset we will use is random numbers, we will generate ~1000 random numbers and perform basic KNN and mean imputation.\n",
+    "\n",
+    "b. The second sample dataset we will use is UCI housing dataset, we will use both scaled and non-scaled imputation technique for mean and KNN imputation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2ceaeb0-e282-4c63-97e2-f1dd03810aa2",
+   "metadata": {},
+   "source": [
+    "# What to try in this notebook?\n",
+    "\n",
+    "#### 1. Get a random number generated dataset from kaggle, use one column and create missing (1%, 5%, 10%), scale values, apply KNN, MEAN imputation. Compare the results and compute mean() and var() for the list of differences between org. and Imputed value \n",
+    "\n",
+    "\n",
+    "#### 2. Use a housing dataset from UCI, use one column and create missing (1%, 5%, 10%), scale values, apply KNN, MEAN imputation. Compare the results and compute mean() and var() for the list of differences between org. and Imputed value \n",
+    "\n",
+    "Dataset - https://raw.githubusercontent.com/SheshNGupta/datasets/main/train.csv"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "d8fe4103-6e71-4b97-810c-b599a0482944",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import warnings\n",
+    "warnings.filterwarnings('ignore')\n",
+    "from sklearn.impute import KNNImputer\n",
+    "from sklearn.preprocessing import MinMaxScaler"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f95427ef-d6bc-47b8-a516-45a05b238180",
+   "metadata": {},
+   "source": [
+    "# 1.1 Random Numbers dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "ae373dd4-26c0-46e8-bdba-dd1d31c77e4e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "random_dataset = pd.DataFrame({'number': np.random.rand(1000)})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "5ea97930-03cd-48ff-97b9-97e9cd9dde55",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>number</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>823</th>\n",
+       "      <td>0.925249</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>266</th>\n",
+       "      <td>0.077479</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>959</th>\n",
+       "      <td>0.897447</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>493</th>\n",
+       "      <td>0.259423</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>768</th>\n",
+       "      <td>0.193178</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>105</th>\n",
+       "      <td>0.174632</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>610</th>\n",
+       "      <td>0.456349</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>824</th>\n",
+       "      <td>0.688290</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>968</th>\n",
+       "      <td>0.493667</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>849</th>\n",
+       "      <td>0.368834</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "       number\n",
+       "823  0.925249\n",
+       "266  0.077479\n",
+       "959  0.897447\n",
+       "493  0.259423\n",
+       "768  0.193178\n",
+       "105  0.174632\n",
+       "610  0.456349\n",
+       "824  0.688290\n",
+       "968  0.493667\n",
+       "849  0.368834"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "random_dataset.sample(10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "f19e199b-91aa-4e03-9e07-37f5a574d481",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "RangeIndex: 1000 entries, 0 to 999\n",
+      "Data columns (total 1 columns):\n",
+      " #   Column  Non-Null Count  Dtype  \n",
+      "---  ------  --------------  -----  \n",
+      " 0   number  1000 non-null   float64\n",
+      "dtypes: float64(1)\n",
+      "memory usage: 7.9 KB\n"
+     ]
+    }
+   ],
+   "source": [
+    "random_dataset.info()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "382f0f03-b3f4-4244-a95c-e78476fae2ca",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "count    1000.000000\n",
+       "mean        0.494461\n",
+       "std         0.286876\n",
+       "min         0.001560\n",
+       "25%         0.252068\n",
+       "50%         0.489302\n",
+       "75%         0.733584\n",
+       "max         0.999815\n",
+       "Name: number, dtype: float64"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "random_dataset['number'].describe()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "348a0b85-c450-4d5d-a9d2-c57c95964b42",
+   "metadata": {},
+   "source": [
+    "#### Create 3 col. for numbers for 1%, 5% and 10% missing data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "f5de26b3-17b7-463b-98e4-147a457ca37e",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>number</th>\n",
+       "      <th>number_copy_1_percent</th>\n",
+       "      <th>number_copy_5_percent</th>\n",
+       "      <th>number_copy_10_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0.438564</td>\n",
+       "      <td>0.438564</td>\n",
+       "      <td>0.438564</td>\n",
+       "      <td>0.438564</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>0.836801</td>\n",
+       "      <td>0.836801</td>\n",
+       "      <td>0.836801</td>\n",
+       "      <td>0.836801</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>0.798077</td>\n",
+       "      <td>0.798077</td>\n",
+       "      <td>0.798077</td>\n",
+       "      <td>0.798077</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>0.269161</td>\n",
+       "      <td>0.269161</td>\n",
+       "      <td>0.269161</td>\n",
+       "      <td>0.269161</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>0.830948</td>\n",
+       "      <td>0.830948</td>\n",
+       "      <td>0.830948</td>\n",
+       "      <td>0.830948</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>995</th>\n",
+       "      <td>0.920130</td>\n",
+       "      <td>0.920130</td>\n",
+       "      <td>0.920130</td>\n",
+       "      <td>0.920130</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>996</th>\n",
+       "      <td>0.007397</td>\n",
+       "      <td>0.007397</td>\n",
+       "      <td>0.007397</td>\n",
+       "      <td>0.007397</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>997</th>\n",
+       "      <td>0.163360</td>\n",
+       "      <td>0.163360</td>\n",
+       "      <td>0.163360</td>\n",
+       "      <td>0.163360</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>998</th>\n",
+       "      <td>0.553700</td>\n",
+       "      <td>0.553700</td>\n",
+       "      <td>0.553700</td>\n",
+       "      <td>0.553700</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>999</th>\n",
+       "      <td>0.771442</td>\n",
+       "      <td>0.771442</td>\n",
+       "      <td>0.771442</td>\n",
+       "      <td>0.771442</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>1000 rows × 4 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "       number  number_copy_1_percent  number_copy_5_percent  \\\n",
+       "0    0.438564               0.438564               0.438564   \n",
+       "1    0.836801               0.836801               0.836801   \n",
+       "2    0.798077               0.798077               0.798077   \n",
+       "3    0.269161               0.269161               0.269161   \n",
+       "4    0.830948               0.830948               0.830948   \n",
+       "..        ...                    ...                    ...   \n",
+       "995  0.920130               0.920130               0.920130   \n",
+       "996  0.007397               0.007397               0.007397   \n",
+       "997  0.163360               0.163360               0.163360   \n",
+       "998  0.553700               0.553700               0.553700   \n",
+       "999  0.771442               0.771442               0.771442   \n",
+       "\n",
+       "     number_copy_10_percent  \n",
+       "0                  0.438564  \n",
+       "1                  0.836801  \n",
+       "2                  0.798077  \n",
+       "3                  0.269161  \n",
+       "4                  0.830948  \n",
+       "..                      ...  \n",
+       "995                0.920130  \n",
+       "996                0.007397  \n",
+       "997                0.163360  \n",
+       "998                0.553700  \n",
+       "999                0.771442  \n",
+       "\n",
+       "[1000 rows x 4 columns]"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_number = random_dataset[['number']]\n",
+    "df_number['number_copy_1_percent'] = df_number[['number']]\n",
+    "df_number['number_copy_5_percent'] = df_number[['number']]\n",
+    "df_number['number_copy_10_percent'] = df_number[['number']]\n",
+    "df_number"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ff95002-46a0-454b-97c1-6c189153d459",
+   "metadata": {},
+   "source": [
+    "#### Check % missing values in this dataframe"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "35c38775-26d9-4b1e-97a9-4c46c0d5d92b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_percent_missing(dataframe):\n",
+    "    \n",
+    "    percent_missing = dataframe.isnull().sum() * 100 / len(dataframe)\n",
+    "    missing_value_df = pd.DataFrame({'column_name': dataframe.columns,\n",
+    "                                     'percent_missing': percent_missing})\n",
+    "    return missing_value_df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "6837b7e5-4444-4914-9c0e-a9cefd2c7b6f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                                   column_name  percent_missing\n",
+      "number                                  number              0.0\n",
+      "number_copy_1_percent    number_copy_1_percent              0.0\n",
+      "number_copy_5_percent    number_copy_5_percent              0.0\n",
+      "number_copy_10_percent  number_copy_10_percent              0.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(df_number))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25318ebf-b1bf-4f4b-ba1d-011b27a27f39",
+   "metadata": {},
+   "source": [
+    "#### Create missing helper fn"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "76da9076-d9c8-417e-bcfc-8ce7066d1a53",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_missing(dataframe, percent, col):\n",
+    "    dataframe.loc[dataframe.sample(frac = percent).index, col] = np.nan"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9dc43e57-be39-4efe-8131-d6a3423b8d77",
+   "metadata": {},
+   "source": [
+    "#### Create missing data in each col"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "6e8ab693-6043-4ade-b62a-9b3fc9ebf735",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "create_missing(df_number, 0.01, 'number_copy_1_percent')\n",
+    "create_missing(df_number, 0.05, 'number_copy_5_percent')\n",
+    "create_missing(df_number, 0.1, 'number_copy_10_percent')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "655cb92a-6b63-4498-9c31-d63f11145569",
+   "metadata": {},
+   "source": [
+    "#### Check % missing after removing data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "412518b5-67ec-4a5a-9720-4a0ce7657d44",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                                   column_name  percent_missing\n",
+      "number                                  number              0.0\n",
+      "number_copy_1_percent    number_copy_1_percent              1.0\n",
+      "number_copy_5_percent    number_copy_5_percent              5.0\n",
+      "number_copy_10_percent  number_copy_10_percent             10.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(df_number))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6876e3fc-b878-4560-a3a4-72c36f2a422e",
+   "metadata": {},
+   "source": [
+    "#### Store the indices of missing rows"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "c1860270-add6-4963-9aef-27ef1e171fca",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Store Index of NaN values in each coloumns\n",
+    "number_1_idx = list(np.where(df_number['number_copy_1_percent'].isna())[0])\n",
+    "number_5_idx = list(np.where(df_number['number_copy_5_percent'].isna())[0])\n",
+    "number_10_idx = list(np.where(df_number['number_copy_10_percent'].isna())[0])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "57841da6-b453-40cc-8ecc-702fe4613a74",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Length of number_1_idx is 10 and it contains 1.0% of total data in column | Total rows: 1000\n",
+      "Length of number_5_idx is 50 and it contains 5.0% of total data in column | Total rows: 1000\n",
+      "Length of number_10_idx is 100 and it contains 10.0% of total data in column | Total rows: 1000\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(f\"Length of number_1_idx is {len(number_1_idx)} and it contains {(len(number_1_idx)/len(df_number['number_copy_1_percent']))*100}% of total data in column | Total rows: {len(df_number['number_copy_1_percent'])}\")\n",
+    "print(f\"Length of number_5_idx is {len(number_5_idx)} and it contains {(len(number_5_idx)/len(df_number['number_copy_1_percent']))*100}% of total data in column | Total rows: {len(df_number['number_copy_1_percent'])}\")\n",
+    "print(f\"Length of number_10_idx is {len(number_10_idx)} and it contains {(len(number_10_idx)/len(df_number['number_copy_1_percent']))*100}% of total data in column | Total rows: {len(df_number['number_copy_1_percent'])}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93450753-9080-4b17-b785-76acd5f9e19f",
+   "metadata": {},
+   "source": [
+    "## What is KNN imputation?\n",
+    "Imputation methodology that works on data that identifies the neighboring points through a measure of distance and the missing values can be estimated using completed values of neighboring observations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47469d0b-a8f3-4469-b18c-3a457f7dc373",
+   "metadata": {},
+   "source": [
+    "### Perform KNN impute to df_number dataframe"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "b09c6c85-4ce3-4aeb-bb81-6a698494a58e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_number1 = df_number.copy(deep=True)\n",
+    "imputer = KNNImputer(n_neighbors=5)\n",
+    "imputed_number_df = pd.DataFrame(imputer.fit_transform(df_number1), columns = df_number1.columns)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "2f051a7d-3ebd-4839-aae0-ef125944d613",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>number</th>\n",
+       "      <th>number_copy_1_percent</th>\n",
+       "      <th>number_copy_5_percent</th>\n",
+       "      <th>number_copy_10_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>701</th>\n",
+       "      <td>0.244629</td>\n",
+       "      <td>0.244629</td>\n",
+       "      <td>0.244629</td>\n",
+       "      <td>0.244629</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>39</th>\n",
+       "      <td>0.517202</td>\n",
+       "      <td>0.517202</td>\n",
+       "      <td>0.517202</td>\n",
+       "      <td>0.517202</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>335</th>\n",
+       "      <td>0.100813</td>\n",
+       "      <td>0.100813</td>\n",
+       "      <td>0.100813</td>\n",
+       "      <td>0.100813</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>204</th>\n",
+       "      <td>0.277534</td>\n",
+       "      <td>0.277534</td>\n",
+       "      <td>0.277534</td>\n",
+       "      <td>0.277534</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>391</th>\n",
+       "      <td>0.859032</td>\n",
+       "      <td>0.859032</td>\n",
+       "      <td>0.857231</td>\n",
+       "      <td>0.859032</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>203</th>\n",
+       "      <td>0.252622</td>\n",
+       "      <td>0.252622</td>\n",
+       "      <td>0.252622</td>\n",
+       "      <td>0.252622</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>144</th>\n",
+       "      <td>0.844587</td>\n",
+       "      <td>0.844587</td>\n",
+       "      <td>0.844587</td>\n",
+       "      <td>0.844587</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>201</th>\n",
+       "      <td>0.431603</td>\n",
+       "      <td>0.431603</td>\n",
+       "      <td>0.431603</td>\n",
+       "      <td>0.431603</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>749</th>\n",
+       "      <td>0.848537</td>\n",
+       "      <td>0.848537</td>\n",
+       "      <td>0.848537</td>\n",
+       "      <td>0.848240</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>497</th>\n",
+       "      <td>0.464531</td>\n",
+       "      <td>0.464531</td>\n",
+       "      <td>0.464531</td>\n",
+       "      <td>0.464531</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "       number  number_copy_1_percent  number_copy_5_percent  \\\n",
+       "701  0.244629               0.244629               0.244629   \n",
+       "39   0.517202               0.517202               0.517202   \n",
+       "335  0.100813               0.100813               0.100813   \n",
+       "204  0.277534               0.277534               0.277534   \n",
+       "391  0.859032               0.859032               0.857231   \n",
+       "203  0.252622               0.252622               0.252622   \n",
+       "144  0.844587               0.844587               0.844587   \n",
+       "201  0.431603               0.431603               0.431603   \n",
+       "749  0.848537               0.848537               0.848537   \n",
+       "497  0.464531               0.464531               0.464531   \n",
+       "\n",
+       "     number_copy_10_percent  \n",
+       "701                0.244629  \n",
+       "39                 0.517202  \n",
+       "335                0.100813  \n",
+       "204                0.277534  \n",
+       "391                0.859032  \n",
+       "203                0.252622  \n",
+       "144                0.844587  \n",
+       "201                0.431603  \n",
+       "749                0.848240  \n",
+       "497                0.464531  "
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "imputed_number_df.sample(10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddc79a45-bd2b-44f3-a3c4-aaefa73b43d9",
+   "metadata": {},
+   "source": [
+    "#### Check the % missing data in dataframe now"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "5c98d450-bf5a-46e5-9091-c6a1202a2611",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                                   column_name  percent_missing\n",
+      "number                                  number              0.0\n",
+      "number_copy_1_percent    number_copy_1_percent              0.0\n",
+      "number_copy_5_percent    number_copy_5_percent              0.0\n",
+      "number_copy_10_percent  number_copy_10_percent              0.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(imputed_number_df))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f14476bf-29e6-4d9a-9cd4-9dd56a53b466",
+   "metadata": {},
+   "source": [
+    "#### Store the list of differences between org. and Imputed value"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "3f096800-dc6e-4455-a9e6-2db18884e5ee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create list of difference bwtween imputed and orginal value\n",
+    "\n",
+    "number_diff_1 = []\n",
+    "number_diff_5 = []\n",
+    "number_diff_10 = []\n",
+    "count = 0\n",
+    "\n",
+    "for i in number_1_idx:\n",
+    "    count +=1\n",
+    "    diff1 = abs(imputed_number_df['number_copy_1_percent'][i] - df_number1['number'][i])\n",
+    "    number_diff_1.append(diff1)\n",
+    "    \n",
+    "\n",
+    "for i in number_5_idx:\n",
+    "    diff5 = abs(imputed_number_df['number_copy_5_percent'][i] - df_number1['number'][i])\n",
+    "    number_diff_5.append(diff5)\n",
+    "\n",
+    "for i in number_10_idx:\n",
+    "    diff10 = abs(imputed_number_df['number_copy_10_percent'][i] - df_number1['number'][i])\n",
+    "    number_diff_10.append(diff10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "4a2c29fc-99f3-4624-808e-437d3983cabb",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10\n",
+      "50\n",
+      "100\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(len(number_diff_1))\n",
+    "print(len(number_diff_5))\n",
+    "print(len(number_diff_10))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ec4adbe-5571-40e3-90ba-92cb431161ca",
+   "metadata": {},
+   "source": [
+    "### Calculate the mean and varience of list of differences KNN"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "1163cb62-9dc4-427e-b5cf-20bf3e16d79b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The mean of 1% is 0.0005846547543839273 and varience 1% is 2.970798404420463e-07\n",
+      "The mean of 5% is 0.000757031064033434 and varience 5% is 4.329913201182178e-07\n",
+      "The mean of 10% is 0.000757031064033434 and varience 10% is 4.0351965946805086e-07\n"
+     ]
+    }
+   ],
+   "source": [
+    "m1 = sum(number_diff_1) / len(number_diff_1)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res1 = sum((xi - m1) ** 2 for xi in number_diff_1) / len(number_diff_1)\n",
+    "\n",
+    "m5 = sum(number_diff_5) / len(number_diff_5)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res5 = sum((xii - m5) ** 2 for xii in number_diff_5) / len(number_diff_5)\n",
+    "\n",
+    "\n",
+    "m10 = sum(number_diff_10) / len(number_diff_10)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res10 = sum((xiii - m10) ** 2 for xiii in number_diff_10) / len(number_diff_10)\n",
+    "\n",
+    "print(f\"The mean of 1% is {m1} and varience 1% is {var_res1}\")\n",
+    "print(f\"The mean of 5% is {m5} and varience 5% is {var_res5}\")\n",
+    "print(f\"The mean of 10% is {m5} and varience 10% is {var_res10}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "6987d059-7449-44a0-a3c2-8605362a18a0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_knn_number = pd.DataFrame.from_dict({'1%_number': [m1, var_res1],\n",
+    " '5%_number': [m5, var_res5],\n",
+    " '10%_number': [m10, var_res10]}, orient='index')\n",
+    "df_knn_number.columns=['diff. list Mean(KNN)', 'diff. list Var.(KNN)']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d1efbf1-61d6-43e1-9a4a-4af137e081c9",
+   "metadata": {},
+   "source": [
+    "## What is Mean imputation?\n",
+    "Mean imputation (MI) is a method in which the mean of the observed values for each variable is computed and the missing values for that variable are imputed by this mean."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41740e20-5dae-403e-a83b-94c91469fcc3",
+   "metadata": {},
+   "source": [
+    "### Perform MEAN based imputation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17b69478-e97c-41b9-828a-eefbb46eb161",
+   "metadata": {},
+   "source": [
+    "#### Before mean imputation % missing"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "5a828216-8f1a-4157-8141-77e6c929f57a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                                   column_name  percent_missing\n",
+      "number                                  number              0.0\n",
+      "number_copy_1_percent    number_copy_1_percent              1.0\n",
+      "number_copy_5_percent    number_copy_5_percent              5.0\n",
+      "number_copy_10_percent  number_copy_10_percent             10.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "df_number2 = df_number.copy(deep=True)\n",
+    "print(get_percent_missing(df_number2))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "1e137676-9f01-44b9-8a84-50d03a89436b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_number2['number_copy_1_percent'] = df_number2['number_copy_1_percent'].fillna(df_number2['number_copy_1_percent'].mean())\n",
+    "df_number2['number_copy_5_percent'] = df_number2['number_copy_5_percent'].fillna(df_number2['number_copy_5_percent'].mean())\n",
+    "df_number2['number_copy_10_percent'] = df_number2['number_copy_10_percent'].fillna(df_number2['number_copy_10_percent'].mean())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8da82021-d96a-46ac-81df-035977cb5497",
+   "metadata": {},
+   "source": [
+    "#### After mean impute % missing "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "669c14bd-f920-47db-8476-1cd1b4f4f5bb",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                                   column_name  percent_missing\n",
+      "number                                  number              0.0\n",
+      "number_copy_1_percent    number_copy_1_percent              0.0\n",
+      "number_copy_5_percent    number_copy_5_percent              0.0\n",
+      "number_copy_10_percent  number_copy_10_percent              0.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(df_number2))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "ccb60d18-b24e-4211-9947-46ee0bcc06fe",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>number</th>\n",
+       "      <th>number_copy_1_percent</th>\n",
+       "      <th>number_copy_5_percent</th>\n",
+       "      <th>number_copy_10_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>293</th>\n",
+       "      <td>0.583231</td>\n",
+       "      <td>0.583231</td>\n",
+       "      <td>0.583231</td>\n",
+       "      <td>0.583231</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>461</th>\n",
+       "      <td>0.867035</td>\n",
+       "      <td>0.867035</td>\n",
+       "      <td>0.867035</td>\n",
+       "      <td>0.867035</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>875</th>\n",
+       "      <td>0.676228</td>\n",
+       "      <td>0.676228</td>\n",
+       "      <td>0.676228</td>\n",
+       "      <td>0.676228</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>999</th>\n",
+       "      <td>0.771442</td>\n",
+       "      <td>0.771442</td>\n",
+       "      <td>0.771442</td>\n",
+       "      <td>0.771442</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>75</th>\n",
+       "      <td>0.909050</td>\n",
+       "      <td>0.909050</td>\n",
+       "      <td>0.909050</td>\n",
+       "      <td>0.909050</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>98</th>\n",
+       "      <td>0.629583</td>\n",
+       "      <td>0.629583</td>\n",
+       "      <td>0.629583</td>\n",
+       "      <td>0.629583</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>381</th>\n",
+       "      <td>0.181614</td>\n",
+       "      <td>0.181614</td>\n",
+       "      <td>0.181614</td>\n",
+       "      <td>0.181614</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>592</th>\n",
+       "      <td>0.523109</td>\n",
+       "      <td>0.523109</td>\n",
+       "      <td>0.523109</td>\n",
+       "      <td>0.523109</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>155</th>\n",
+       "      <td>0.038074</td>\n",
+       "      <td>0.038074</td>\n",
+       "      <td>0.038074</td>\n",
+       "      <td>0.038074</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>630</th>\n",
+       "      <td>0.869200</td>\n",
+       "      <td>0.869200</td>\n",
+       "      <td>0.869200</td>\n",
+       "      <td>0.869200</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "       number  number_copy_1_percent  number_copy_5_percent  \\\n",
+       "293  0.583231               0.583231               0.583231   \n",
+       "461  0.867035               0.867035               0.867035   \n",
+       "875  0.676228               0.676228               0.676228   \n",
+       "999  0.771442               0.771442               0.771442   \n",
+       "75   0.909050               0.909050               0.909050   \n",
+       "98   0.629583               0.629583               0.629583   \n",
+       "381  0.181614               0.181614               0.181614   \n",
+       "592  0.523109               0.523109               0.523109   \n",
+       "155  0.038074               0.038074               0.038074   \n",
+       "630  0.869200               0.869200               0.869200   \n",
+       "\n",
+       "     number_copy_10_percent  \n",
+       "293                0.583231  \n",
+       "461                0.867035  \n",
+       "875                0.676228  \n",
+       "999                0.771442  \n",
+       "75                 0.909050  \n",
+       "98                 0.629583  \n",
+       "381                0.181614  \n",
+       "592                0.523109  \n",
+       "155                0.038074  \n",
+       "630                0.869200  "
+      ]
+     },
+     "execution_count": 26,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_number2.sample(10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88d89795-0ae9-4f37-89cd-b24d36658588",
+   "metadata": {},
+   "source": [
+    "#### Create a list of difference -  MEAN"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "530979d5-52c4-473d-95f3-754c460a7ab6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create list of difference bwtween imputed and orginal value\n",
+    "\n",
+    "number_diff_1_mean = []\n",
+    "number_diff_5_mean = []\n",
+    "number_diff_10_mean = []\n",
+    "count = 0\n",
+    "\n",
+    "for i in number_1_idx:\n",
+    "    count +=1\n",
+    "    diff1 = abs(df_number2['number_copy_1_percent'][i] - df_number2['number'][i])\n",
+    "    number_diff_1_mean.append(diff1)\n",
+    "    \n",
+    "\n",
+    "for i in number_5_idx:\n",
+    "    diff5 = abs(df_number2['number_copy_5_percent'][i] - df_number2['number'][i])\n",
+    "    number_diff_5_mean.append(diff5)\n",
+    "\n",
+    "for i in number_10_idx:\n",
+    "    diff10 = abs(df_number2['number_copy_10_percent'][i] - df_number2['number'][i])\n",
+    "    number_diff_10_mean.append(diff10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "28dd2494-0175-431e-b4b7-09ee4af1f6a0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10\n",
+      "50\n",
+      "100\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(len(number_diff_1_mean))\n",
+    "print(len(number_diff_5_mean))\n",
+    "print(len(number_diff_10_mean))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e90251e-4c0a-4e2d-82b1-8764374aed1c",
+   "metadata": {},
+   "source": [
+    "### Calculate the mean and var of the list of differences - MEAN Impute"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "682bd76e-4875-4b4d-b90b-91d8a6e492ae",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The mean of 1% is 0.29595595666774266 and varience 1% is 0.02234691636534702\n",
+      "The mean of 5% is 0.2606794287327926 and varience 5% is 0.017948559982927326\n",
+      "The mean of 10% is 0.2606794287327926 and varience 10% is 0.019225304317791198\n"
+     ]
+    }
+   ],
+   "source": [
+    "m1 = sum(number_diff_1_mean) / len(number_diff_1_mean)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res1 = sum((xi - m1) ** 2 for xi in number_diff_1_mean) / len(number_diff_1_mean)\n",
+    "\n",
+    "m5 = sum(number_diff_5_mean) / len(number_diff_5_mean)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res5 = sum((xii - m5) ** 2 for xii in number_diff_5_mean) / len(number_diff_5_mean)\n",
+    "\n",
+    "\n",
+    "m10 = sum(number_diff_10_mean) / len(number_diff_10_mean)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res10 = sum((xiii - m10) ** 2 for xiii in number_diff_10_mean) / len(number_diff_10_mean)\n",
+    "\n",
+    "print(f\"The mean of 1% is {m1} and varience 1% is {var_res1}\")\n",
+    "print(f\"The mean of 5% is {m5} and varience 5% is {var_res5}\")\n",
+    "print(f\"The mean of 10% is {m5} and varience 10% is {var_res10}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "1f41880d-3e7d-48c9-8744-7e47ccae3c17",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_MI_number = pd.DataFrame.from_dict({'1%_number': [m1, var_res1],\n",
+    " '5%_number': [m5, var_res5],\n",
+    " '10%_number': [m10, var_res10]}, orient='index')\n",
+    "df_MI_number.columns=['diff. list Mean(MI)', 'diff. list Var.(MI)']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec64b079-db97-429c-ae3a-519eec91db3f",
+   "metadata": {},
+   "source": [
+    "## KNN and MEAN columns side by side"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "d74b0e73-e3f0-4107-806d-c5d5a50aab9a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from IPython.display import display_html\n",
+    "from itertools import chain,cycle\n",
+    "def display_side_by_side(*args,titles=cycle([''])):\n",
+    "    html_str=''\n",
+    "    for df,title in zip(args, chain(titles,cycle(['</br>'])) ):\n",
+    "        html_str+='<th style=\"text-align:center\"><td style=\"vertical-align:top\">'\n",
+    "        html_str+=f'<h2>{title}</h2>'\n",
+    "        html_str+=df.to_html().replace('table','table style=\"display:inline\"')\n",
+    "        html_str+='</td></th>'\n",
+    "    display_html(html_str,raw=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "747a487f-cbc4-467a-9bc7-b0856dbb6576",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<style>\n",
+       ".output {\n",
+       "    flex-direction: row;\n",
+       "}\n",
+       "</style>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "execution_count": 32,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from IPython.display import display, HTML\n",
+    "\n",
+    "CSS = \"\"\"\n",
+    ".output {\n",
+    "    flex-direction: row;\n",
+    "}\n",
+    "\"\"\"\n",
+    "\n",
+    "HTML('<style>{}</style>'.format(CSS))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "d24551d1-cd58-4a41-8262-873fe5034272",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# https://github.com/epmoyer/ipy_table/issues/24\n",
+    "\n",
+    "from IPython.core.display import HTML\n",
+    "\n",
+    "def multi_table(table_list):\n",
+    "    ''' Acceps a list of IpyTable objects and returns a table which contains each IpyTable in a cell\n",
+    "    '''\n",
+    "    return HTML(\n",
+    "        '<table><tr style=\"background-color:white;\">' + \n",
+    "        ''.join(['<td>' + table._repr_html_() + '</td>' for table in table_list]) +\n",
+    "        '</tr></table>'\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "8a8daa30-3abf-4315-ae58-f9171ff000d5",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[103, 272, 302, 441, 542]\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(number_1_idx[:5])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "da6b1646-2417-42b7-bc8f-d3b0be85c61b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "compare_1 = imputed_number_df.loc[:, [\"number\", \"number_copy_1_percent\"]]\n",
+    "compare_5 = imputed_number_df.loc[:, [\"number\", \"number_copy_5_percent\"]]\n",
+    "compare_10 = imputed_number_df.loc[:, [\"number\", \"number_copy_10_percent\"]]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "380b94cf-264f-4a41-bb1d-ac272354073f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "compare_1_df =  compare_1.iloc[number_1_idx]\n",
+    "compare_5_df =  compare_5.iloc[number_5_idx]\n",
+    "compare_10_df =  compare_10.iloc[number_10_idx]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "e5b21e71-0ddd-4c60-b931-b384d65230dd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "compare_1_mean = df_number2.loc[:, [\"number\", \"number_copy_1_percent\"]]\n",
+    "compare_5_mean = df_number2.loc[:, [\"number\", \"number_copy_5_percent\"]]\n",
+    "compare_10_mean = df_number2.loc[:, [\"number\", \"number_copy_10_percent\"]]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "29be3554-8129-4f0c-bad6-1270b7c6c05b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "compare_1_mean_df =  compare_1_mean.iloc[number_1_idx]\n",
+    "compare_5_mean_df =  compare_5_mean.iloc[number_5_idx]\n",
+    "compare_10_mean_df =  compare_10_mean.iloc[number_10_idx]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "72a3bc3c-0f91-49ad-bf03-dc4b7ace265d",
+   "metadata": {},
+   "source": [
+    "#### **number 1% KNN Impute VS number 1% Mean Impute**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "6fd11f89-9f4b-49b3-b114-1ab3b461f180",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<table><tr style=\"background-color:white;\"><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>number</th>\n",
+       "      <th>number_copy_1_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>103</th>\n",
+       "      <td>0.915554</td>\n",
+       "      <td>0.915539</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>272</th>\n",
+       "      <td>0.899497</td>\n",
+       "      <td>0.899795</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>302</th>\n",
+       "      <td>0.091276</td>\n",
+       "      <td>0.090500</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>441</th>\n",
+       "      <td>0.050874</td>\n",
+       "      <td>0.050914</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>542</th>\n",
+       "      <td>0.744744</td>\n",
+       "      <td>0.744208</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>number</th>\n",
+       "      <th>number_copy_1_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>103</th>\n",
+       "      <td>0.915554</td>\n",
+       "      <td>0.493992</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>272</th>\n",
+       "      <td>0.899497</td>\n",
+       "      <td>0.493992</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>302</th>\n",
+       "      <td>0.091276</td>\n",
+       "      <td>0.493992</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>441</th>\n",
+       "      <td>0.050874</td>\n",
+       "      <td>0.493992</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>542</th>\n",
+       "      <td>0.744744</td>\n",
+       "      <td>0.493992</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td></tr></table>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "execution_count": 39,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "multi_table([compare_1_df.head(), compare_1_mean_df.head()])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1fc9d1c-53ef-42d3-809b-d68051057e48",
+   "metadata": {},
+   "source": [
+    "#### **number 5% KNN Impute VS number 5% Mean Impute**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "a97c1530-2e50-48d2-a7e0-89fc70f648e5",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<table><tr style=\"background-color:white;\"><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>number</th>\n",
+       "      <th>number_copy_5_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>0.040451</td>\n",
+       "      <td>0.039472</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>0.852026</td>\n",
+       "      <td>0.849692</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>0.213343</td>\n",
+       "      <td>0.212438</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>49</th>\n",
+       "      <td>0.608203</td>\n",
+       "      <td>0.609078</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>64</th>\n",
+       "      <td>0.973574</td>\n",
+       "      <td>0.972234</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>number</th>\n",
+       "      <th>number_copy_5_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>0.040451</td>\n",
+       "      <td>0.49266</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>0.852026</td>\n",
+       "      <td>0.49266</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>0.213343</td>\n",
+       "      <td>0.49266</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>49</th>\n",
+       "      <td>0.608203</td>\n",
+       "      <td>0.49266</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>64</th>\n",
+       "      <td>0.973574</td>\n",
+       "      <td>0.49266</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td></tr></table>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "execution_count": 40,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "multi_table([compare_5_df.head(), compare_5_mean_df.head()])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e732ac9-faf7-4457-baef-ac9c4976598c",
+   "metadata": {},
+   "source": [
+    "#### **number 10% KNN Impute VS number 10% Mean Impute**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "f2d22e8f-5a0b-48c0-9150-a391d48e93b2",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<table><tr style=\"background-color:white;\"><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>number</th>\n",
+       "      <th>number_copy_10_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>0.205755</td>\n",
+       "      <td>0.206019</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>0.213343</td>\n",
+       "      <td>0.212724</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>27</th>\n",
+       "      <td>0.738704</td>\n",
+       "      <td>0.737446</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>29</th>\n",
+       "      <td>0.322577</td>\n",
+       "      <td>0.322495</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>43</th>\n",
+       "      <td>0.403866</td>\n",
+       "      <td>0.404988</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>number</th>\n",
+       "      <th>number_copy_10_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>0.205755</td>\n",
+       "      <td>0.50025</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>0.213343</td>\n",
+       "      <td>0.50025</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>27</th>\n",
+       "      <td>0.738704</td>\n",
+       "      <td>0.50025</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>29</th>\n",
+       "      <td>0.322577</td>\n",
+       "      <td>0.50025</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>43</th>\n",
+       "      <td>0.403866</td>\n",
+       "      <td>0.50025</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td></tr></table>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "execution_count": 41,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "multi_table([compare_10_df.head(), compare_10_mean_df.head()])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc817314-971f-4abf-a56e-9830a5cf0329",
+   "metadata": {},
+   "source": [
+    "# 1.2 Random Numbers dataset Results - KNN and MEAN"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "id": "c4ebb2fe-34e9-4bd2-bf53-9392e5d05e52",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<table><tr style=\"background-color:white;\"><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>diff. list Mean(KNN)</th>\n",
+       "      <th>diff. list Var.(KNN)</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1%_number</th>\n",
+       "      <td>0.000585</td>\n",
+       "      <td>2.970798e-07</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5%_number</th>\n",
+       "      <td>0.000757</td>\n",
+       "      <td>4.329913e-07</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10%_number</th>\n",
+       "      <td>0.000661</td>\n",
+       "      <td>4.035197e-07</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>diff. list Mean(MI)</th>\n",
+       "      <th>diff. list Var.(MI)</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1%_number</th>\n",
+       "      <td>0.295956</td>\n",
+       "      <td>0.022347</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5%_number</th>\n",
+       "      <td>0.260679</td>\n",
+       "      <td>0.017949</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10%_number</th>\n",
+       "      <td>0.242477</td>\n",
+       "      <td>0.019225</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td></tr></table>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "execution_count": 48,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "multi_table([df_knn_number, df_MI_number])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "177bab9a-d501-479d-bbe8-d0c93926a24d",
+   "metadata": {},
+   "source": [
+    "Results : We can see here that KNN performed much better than the mean imputation since KNN will use the method of finding the nearest neighbour. The error in the actual and the imputed value is almost close to zero which signifies that this method is actually predicting and imputing correct values."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08586561-e3a5-4d15-a1c0-b8d71731a84a",
+   "metadata": {},
+   "source": [
+    "# 2.1 Housing Dataset "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "id": "c05f4dd5-4cdc-4617-939a-2e22ec859af1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "housing_data = pd.read_csv('https://raw.githubusercontent.com/SheshNGupta/datasets/main/train.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "id": "8564d163-97ce-44da-8d3c-6f8cd9c1d0a1",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Id</th>\n",
+       "      <th>MSSubClass</th>\n",
+       "      <th>MSZoning</th>\n",
+       "      <th>LotFrontage</th>\n",
+       "      <th>LotArea</th>\n",
+       "      <th>Street</th>\n",
+       "      <th>Alley</th>\n",
+       "      <th>LotShape</th>\n",
+       "      <th>LandContour</th>\n",
+       "      <th>Utilities</th>\n",
+       "      <th>...</th>\n",
+       "      <th>PoolArea</th>\n",
+       "      <th>PoolQC</th>\n",
+       "      <th>Fence</th>\n",
+       "      <th>MiscFeature</th>\n",
+       "      <th>MiscVal</th>\n",
+       "      <th>MoSold</th>\n",
+       "      <th>YrSold</th>\n",
+       "      <th>SaleType</th>\n",
+       "      <th>SaleCondition</th>\n",
+       "      <th>SalePrice</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>740</th>\n",
+       "      <td>741</td>\n",
+       "      <td>70</td>\n",
+       "      <td>RM</td>\n",
+       "      <td>60.0</td>\n",
+       "      <td>9600</td>\n",
+       "      <td>Pave</td>\n",
+       "      <td>Grvl</td>\n",
+       "      <td>Reg</td>\n",
+       "      <td>Lvl</td>\n",
+       "      <td>AllPub</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>GdPrv</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0</td>\n",
+       "      <td>5</td>\n",
+       "      <td>2007</td>\n",
+       "      <td>WD</td>\n",
+       "      <td>Abnorml</td>\n",
+       "      <td>132000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1209</th>\n",
+       "      <td>1210</td>\n",
+       "      <td>20</td>\n",
+       "      <td>RL</td>\n",
+       "      <td>85.0</td>\n",
+       "      <td>10182</td>\n",
+       "      <td>Pave</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>IR1</td>\n",
+       "      <td>Lvl</td>\n",
+       "      <td>AllPub</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0</td>\n",
+       "      <td>5</td>\n",
+       "      <td>2006</td>\n",
+       "      <td>New</td>\n",
+       "      <td>Partial</td>\n",
+       "      <td>290000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>64</th>\n",
+       "      <td>65</td>\n",
+       "      <td>60</td>\n",
+       "      <td>RL</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>9375</td>\n",
+       "      <td>Pave</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Reg</td>\n",
+       "      <td>Lvl</td>\n",
+       "      <td>AllPub</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>GdPrv</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2009</td>\n",
+       "      <td>WD</td>\n",
+       "      <td>Normal</td>\n",
+       "      <td>219500</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>208</th>\n",
+       "      <td>209</td>\n",
+       "      <td>60</td>\n",
+       "      <td>RL</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>14364</td>\n",
+       "      <td>Pave</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>IR1</td>\n",
+       "      <td>Low</td>\n",
+       "      <td>AllPub</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2007</td>\n",
+       "      <td>WD</td>\n",
+       "      <td>Normal</td>\n",
+       "      <td>277000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>436</th>\n",
+       "      <td>437</td>\n",
+       "      <td>50</td>\n",
+       "      <td>RM</td>\n",
+       "      <td>40.0</td>\n",
+       "      <td>4400</td>\n",
+       "      <td>Pave</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Reg</td>\n",
+       "      <td>Lvl</td>\n",
+       "      <td>AllPub</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0</td>\n",
+       "      <td>10</td>\n",
+       "      <td>2006</td>\n",
+       "      <td>WD</td>\n",
+       "      <td>Normal</td>\n",
+       "      <td>116000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19</th>\n",
+       "      <td>20</td>\n",
+       "      <td>20</td>\n",
+       "      <td>RL</td>\n",
+       "      <td>70.0</td>\n",
+       "      <td>7560</td>\n",
+       "      <td>Pave</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Reg</td>\n",
+       "      <td>Lvl</td>\n",
+       "      <td>AllPub</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>MnPrv</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0</td>\n",
+       "      <td>5</td>\n",
+       "      <td>2009</td>\n",
+       "      <td>COD</td>\n",
+       "      <td>Abnorml</td>\n",
+       "      <td>139000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1449</th>\n",
+       "      <td>1450</td>\n",
+       "      <td>180</td>\n",
+       "      <td>RM</td>\n",
+       "      <td>21.0</td>\n",
+       "      <td>1533</td>\n",
+       "      <td>Pave</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Reg</td>\n",
+       "      <td>Lvl</td>\n",
+       "      <td>AllPub</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0</td>\n",
+       "      <td>8</td>\n",
+       "      <td>2006</td>\n",
+       "      <td>WD</td>\n",
+       "      <td>Abnorml</td>\n",
+       "      <td>92000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>449</th>\n",
+       "      <td>450</td>\n",
+       "      <td>50</td>\n",
+       "      <td>RM</td>\n",
+       "      <td>50.0</td>\n",
+       "      <td>6000</td>\n",
+       "      <td>Pave</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Reg</td>\n",
+       "      <td>Lvl</td>\n",
+       "      <td>AllPub</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0</td>\n",
+       "      <td>6</td>\n",
+       "      <td>2007</td>\n",
+       "      <td>WD</td>\n",
+       "      <td>Normal</td>\n",
+       "      <td>120000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1185</th>\n",
+       "      <td>1186</td>\n",
+       "      <td>50</td>\n",
+       "      <td>RL</td>\n",
+       "      <td>60.0</td>\n",
+       "      <td>9738</td>\n",
+       "      <td>Pave</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Reg</td>\n",
+       "      <td>Lvl</td>\n",
+       "      <td>AllPub</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2006</td>\n",
+       "      <td>WD</td>\n",
+       "      <td>Normal</td>\n",
+       "      <td>104900</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1023</th>\n",
+       "      <td>1024</td>\n",
+       "      <td>120</td>\n",
+       "      <td>RL</td>\n",
+       "      <td>43.0</td>\n",
+       "      <td>3182</td>\n",
+       "      <td>Pave</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Reg</td>\n",
+       "      <td>Lvl</td>\n",
+       "      <td>AllPub</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0</td>\n",
+       "      <td>5</td>\n",
+       "      <td>2008</td>\n",
+       "      <td>WD</td>\n",
+       "      <td>Normal</td>\n",
+       "      <td>191000</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>10 rows × 81 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "        Id  MSSubClass MSZoning  LotFrontage  LotArea Street Alley LotShape  \\\n",
+       "740    741          70       RM         60.0     9600   Pave  Grvl      Reg   \n",
+       "1209  1210          20       RL         85.0    10182   Pave   NaN      IR1   \n",
+       "64      65          60       RL          NaN     9375   Pave   NaN      Reg   \n",
+       "208    209          60       RL          NaN    14364   Pave   NaN      IR1   \n",
+       "436    437          50       RM         40.0     4400   Pave   NaN      Reg   \n",
+       "19      20          20       RL         70.0     7560   Pave   NaN      Reg   \n",
+       "1449  1450         180       RM         21.0     1533   Pave   NaN      Reg   \n",
+       "449    450          50       RM         50.0     6000   Pave   NaN      Reg   \n",
+       "1185  1186          50       RL         60.0     9738   Pave   NaN      Reg   \n",
+       "1023  1024         120       RL         43.0     3182   Pave   NaN      Reg   \n",
+       "\n",
+       "     LandContour Utilities  ... PoolArea PoolQC  Fence MiscFeature MiscVal  \\\n",
+       "740          Lvl    AllPub  ...        0    NaN  GdPrv         NaN       0   \n",
+       "1209         Lvl    AllPub  ...        0    NaN    NaN         NaN       0   \n",
+       "64           Lvl    AllPub  ...        0    NaN  GdPrv         NaN       0   \n",
+       "208          Low    AllPub  ...        0    NaN    NaN         NaN       0   \n",
+       "436          Lvl    AllPub  ...        0    NaN    NaN         NaN       0   \n",
+       "19           Lvl    AllPub  ...        0    NaN  MnPrv         NaN       0   \n",
+       "1449         Lvl    AllPub  ...        0    NaN    NaN         NaN       0   \n",
+       "449          Lvl    AllPub  ...        0    NaN    NaN         NaN       0   \n",
+       "1185         Lvl    AllPub  ...        0    NaN    NaN         NaN       0   \n",
+       "1023         Lvl    AllPub  ...        0    NaN    NaN         NaN       0   \n",
+       "\n",
+       "     MoSold YrSold  SaleType  SaleCondition  SalePrice  \n",
+       "740       5   2007        WD        Abnorml     132000  \n",
+       "1209      5   2006       New        Partial     290000  \n",
+       "64        2   2009        WD         Normal     219500  \n",
+       "208       4   2007        WD         Normal     277000  \n",
+       "436      10   2006        WD         Normal     116000  \n",
+       "19        5   2009       COD        Abnorml     139000  \n",
+       "1449      8   2006        WD        Abnorml      92000  \n",
+       "449       6   2007        WD         Normal     120000  \n",
+       "1185      3   2006        WD         Normal     104900  \n",
+       "1023      5   2008        WD         Normal     191000  \n",
+       "\n",
+       "[10 rows x 81 columns]"
+      ]
+     },
+     "execution_count": 52,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "housing_data.sample(10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "id": "bd81975c-0a21-414b-8e20-3564d35b9f9b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "663"
+      ]
+     },
+     "execution_count": 53,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "housing_data['SalePrice'].nunique()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "id": "67d1046e-a1ad-412e-a7e8-a0d51729cec7",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "1073"
+      ]
+     },
+     "execution_count": 54,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "housing_data['LotArea'].nunique()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "id": "64b05e52-72dc-4f7d-aca3-d043036b4d2f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "count      1460.000000\n",
+       "mean     180921.195890\n",
+       "std       79442.502883\n",
+       "min       34900.000000\n",
+       "25%      129975.000000\n",
+       "50%      163000.000000\n",
+       "75%      214000.000000\n",
+       "max      755000.000000\n",
+       "Name: SalePrice, dtype: float64"
+      ]
+     },
+     "execution_count": 55,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "housing_data['SalePrice'].describe()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 56,
+   "id": "b7e9928c-4785-4ee1-8150-cd0fa1ef3325",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "count      1460.000000\n",
+       "mean      10516.828082\n",
+       "std        9981.264932\n",
+       "min        1300.000000\n",
+       "25%        7553.500000\n",
+       "50%        9478.500000\n",
+       "75%       11601.500000\n",
+       "max      215245.000000\n",
+       "Name: LotArea, dtype: float64"
+      ]
+     },
+     "execution_count": 56,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "housing_data['LotArea'].describe()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "id": "20149f80-07dc-4eaa-8d0e-7de6612a7dce",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                 column_name  percent_missing\n",
+      "Id                        Id         0.000000\n",
+      "MSSubClass        MSSubClass         0.000000\n",
+      "MSZoning            MSZoning         0.000000\n",
+      "LotFrontage      LotFrontage        17.739726\n",
+      "LotArea              LotArea         0.000000\n",
+      "Street                Street         0.000000\n",
+      "Alley                  Alley        93.767123\n",
+      "LotShape            LotShape         0.000000\n",
+      "LandContour      LandContour         0.000000\n",
+      "Utilities          Utilities         0.000000\n",
+      "LotConfig          LotConfig         0.000000\n",
+      "LandSlope          LandSlope         0.000000\n",
+      "Neighborhood    Neighborhood         0.000000\n",
+      "Condition1        Condition1         0.000000\n",
+      "Condition2        Condition2         0.000000\n",
+      "BldgType            BldgType         0.000000\n",
+      "HouseStyle        HouseStyle         0.000000\n",
+      "OverallQual      OverallQual         0.000000\n",
+      "OverallCond      OverallCond         0.000000\n",
+      "YearBuilt          YearBuilt         0.000000\n",
+      "YearRemodAdd    YearRemodAdd         0.000000\n",
+      "RoofStyle          RoofStyle         0.000000\n",
+      "RoofMatl            RoofMatl         0.000000\n",
+      "Exterior1st      Exterior1st         0.000000\n",
+      "Exterior2nd      Exterior2nd         0.000000\n",
+      "MasVnrType        MasVnrType         0.547945\n",
+      "MasVnrArea        MasVnrArea         0.547945\n",
+      "ExterQual          ExterQual         0.000000\n",
+      "ExterCond          ExterCond         0.000000\n",
+      "Foundation        Foundation         0.000000\n",
+      "BsmtQual            BsmtQual         2.534247\n",
+      "BsmtCond            BsmtCond         2.534247\n",
+      "BsmtExposure    BsmtExposure         2.602740\n",
+      "BsmtFinType1    BsmtFinType1         2.534247\n",
+      "BsmtFinSF1        BsmtFinSF1         0.000000\n",
+      "BsmtFinType2    BsmtFinType2         2.602740\n",
+      "BsmtFinSF2        BsmtFinSF2         0.000000\n",
+      "BsmtUnfSF          BsmtUnfSF         0.000000\n",
+      "TotalBsmtSF      TotalBsmtSF         0.000000\n",
+      "Heating              Heating         0.000000\n",
+      "HeatingQC          HeatingQC         0.000000\n",
+      "CentralAir        CentralAir         0.000000\n",
+      "Electrical        Electrical         0.068493\n",
+      "1stFlrSF            1stFlrSF         0.000000\n",
+      "2ndFlrSF            2ndFlrSF         0.000000\n",
+      "LowQualFinSF    LowQualFinSF         0.000000\n",
+      "GrLivArea          GrLivArea         0.000000\n",
+      "BsmtFullBath    BsmtFullBath         0.000000\n",
+      "BsmtHalfBath    BsmtHalfBath         0.000000\n",
+      "FullBath            FullBath         0.000000\n",
+      "HalfBath            HalfBath         0.000000\n",
+      "BedroomAbvGr    BedroomAbvGr         0.000000\n",
+      "KitchenAbvGr    KitchenAbvGr         0.000000\n",
+      "KitchenQual      KitchenQual         0.000000\n",
+      "TotRmsAbvGrd    TotRmsAbvGrd         0.000000\n",
+      "Functional        Functional         0.000000\n",
+      "Fireplaces        Fireplaces         0.000000\n",
+      "FireplaceQu      FireplaceQu        47.260274\n",
+      "GarageType        GarageType         5.547945\n",
+      "GarageYrBlt      GarageYrBlt         5.547945\n",
+      "GarageFinish    GarageFinish         5.547945\n",
+      "GarageCars        GarageCars         0.000000\n",
+      "GarageArea        GarageArea         0.000000\n",
+      "GarageQual        GarageQual         5.547945\n",
+      "GarageCond        GarageCond         5.547945\n",
+      "PavedDrive        PavedDrive         0.000000\n",
+      "WoodDeckSF        WoodDeckSF         0.000000\n",
+      "OpenPorchSF      OpenPorchSF         0.000000\n",
+      "EnclosedPorch  EnclosedPorch         0.000000\n",
+      "3SsnPorch          3SsnPorch         0.000000\n",
+      "ScreenPorch      ScreenPorch         0.000000\n",
+      "PoolArea            PoolArea         0.000000\n",
+      "PoolQC                PoolQC        99.520548\n",
+      "Fence                  Fence        80.753425\n",
+      "MiscFeature      MiscFeature        96.301370\n",
+      "MiscVal              MiscVal         0.000000\n",
+      "MoSold                MoSold         0.000000\n",
+      "YrSold                YrSold         0.000000\n",
+      "SaleType            SaleType         0.000000\n",
+      "SaleCondition  SaleCondition         0.000000\n",
+      "SalePrice          SalePrice         0.000000\n"
+     ]
+    }
+   ],
+   "source": [
+    "pd.set_option('display.max_rows', None)\n",
+    "print(get_percent_missing(housing_data))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c8eb3ee3-085d-4b41-9a5f-c83a3805f870",
+   "metadata": {},
+   "source": [
+    "#### Using Sale price coloumn for KNN and MEAN imputation task"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "451c79fb-17ba-40ac-8f0b-87a8b2ec4837",
+   "metadata": {},
+   "source": [
+    "#### Non Scaled dataframe Sale Price - take first 1000 rows"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 58,
+   "id": "9cc1f97f-1b24-4570-8f6a-30426bd79269",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>SalePrice</th>\n",
+       "      <th>sp_copy_1_percent</th>\n",
+       "      <th>sp_copy_5_percent</th>\n",
+       "      <th>sp_copy_10_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>208500</td>\n",
+       "      <td>208500</td>\n",
+       "      <td>208500</td>\n",
+       "      <td>208500</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>181500</td>\n",
+       "      <td>181500</td>\n",
+       "      <td>181500</td>\n",
+       "      <td>181500</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>223500</td>\n",
+       "      <td>223500</td>\n",
+       "      <td>223500</td>\n",
+       "      <td>223500</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>140000</td>\n",
+       "      <td>140000</td>\n",
+       "      <td>140000</td>\n",
+       "      <td>140000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>250000</td>\n",
+       "      <td>250000</td>\n",
+       "      <td>250000</td>\n",
+       "      <td>250000</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   SalePrice  sp_copy_1_percent  sp_copy_5_percent  sp_copy_10_percent\n",
+       "0     208500             208500             208500              208500\n",
+       "1     181500             181500             181500              181500\n",
+       "2     223500             223500             223500              223500\n",
+       "3     140000             140000             140000              140000\n",
+       "4     250000             250000             250000              250000"
+      ]
+     },
+     "execution_count": 58,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_saleprice = housing_data[['SalePrice']][:1000]\n",
+    "df_saleprice['sp_copy_1_percent'] = df_saleprice[['SalePrice']]\n",
+    "df_saleprice['sp_copy_5_percent'] = df_saleprice[['SalePrice']]\n",
+    "df_saleprice['sp_copy_10_percent'] = df_saleprice[['SalePrice']]\n",
+    "df_saleprice.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "id": "f462f065-9f37-44f1-a22e-92e610dae2e9",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "1000"
+      ]
+     },
+     "execution_count": 59,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(df_saleprice)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03407bbd-f8a7-4f6c-a7c3-64a865ed3f7e",
+   "metadata": {},
+   "source": [
+    "#### Scaled Dataframe SalePrice - take first 1000 rows"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "id": "e461b1ef-df2c-410f-aea8-abe954fa9afd",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>SalePrice</th>\n",
+       "      <th>sp_copy_1_percent</th>\n",
+       "      <th>sp_copy_5_percent</th>\n",
+       "      <th>sp_copy_10_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0.241078</td>\n",
+       "      <td>0.241078</td>\n",
+       "      <td>0.241078</td>\n",
+       "      <td>0.241078</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>0.203583</td>\n",
+       "      <td>0.203583</td>\n",
+       "      <td>0.203583</td>\n",
+       "      <td>0.203583</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>0.261908</td>\n",
+       "      <td>0.261908</td>\n",
+       "      <td>0.261908</td>\n",
+       "      <td>0.261908</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>0.145952</td>\n",
+       "      <td>0.145952</td>\n",
+       "      <td>0.145952</td>\n",
+       "      <td>0.145952</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>0.298709</td>\n",
+       "      <td>0.298709</td>\n",
+       "      <td>0.298709</td>\n",
+       "      <td>0.298709</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   SalePrice  sp_copy_1_percent  sp_copy_5_percent  sp_copy_10_percent\n",
+       "0   0.241078           0.241078           0.241078            0.241078\n",
+       "1   0.203583           0.203583           0.203583            0.203583\n",
+       "2   0.261908           0.261908           0.261908            0.261908\n",
+       "3   0.145952           0.145952           0.145952            0.145952\n",
+       "4   0.298709           0.298709           0.298709            0.298709"
+      ]
+     },
+     "execution_count": 60,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "scaler = MinMaxScaler()\n",
+    "df_saleprice_scaled = df_saleprice.copy(deep=True)\n",
+    "df_saleprice_scaled = pd.DataFrame(scaler.fit_transform(df_saleprice_scaled), columns = df_saleprice_scaled.columns)\n",
+    "df_saleprice_scaled.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a66683c4-f66a-4aa1-ab8a-f28087b60b6c",
+   "metadata": {},
+   "source": [
+    "#### Check % missing values in this dataframe"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "id": "0075fa0f-4b82-4089-ab81-e5282497c4a3",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                           column_name  percent_missing\n",
+      "SalePrice                    SalePrice              0.0\n",
+      "sp_copy_1_percent    sp_copy_1_percent              0.0\n",
+      "sp_copy_5_percent    sp_copy_5_percent              0.0\n",
+      "sp_copy_10_percent  sp_copy_10_percent              0.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(df_saleprice))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "619ef99f-55c0-422c-aaa8-73cd71fcf2fb",
+   "metadata": {},
+   "source": [
+    "#### Create 1%, 5% and 10% missing data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 62,
+   "id": "82df5098-4176-4fba-922f-ca84c0466f2a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "create_missing(df_saleprice, 0.01, 'sp_copy_1_percent')\n",
+    "create_missing(df_saleprice, 0.05, 'sp_copy_5_percent')\n",
+    "create_missing(df_saleprice, 0.1, 'sp_copy_10_percent')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 63,
+   "id": "0e90ae04-cd10-4507-a851-c187010f0be0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "create_missing(df_saleprice_scaled, 0.01, 'sp_copy_1_percent')\n",
+    "create_missing(df_saleprice_scaled, 0.05, 'sp_copy_5_percent')\n",
+    "create_missing(df_saleprice_scaled, 0.1, 'sp_copy_10_percent')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8237a82-5a33-4ce9-b4c7-a48ede4f5fef",
+   "metadata": {},
+   "source": [
+    "#### With/Without scaling dataframe missing values check"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 64,
+   "id": "2794306d-89c7-4518-8979-9edb3d9441b1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                           column_name  percent_missing\n",
+      "SalePrice                    SalePrice              0.0\n",
+      "sp_copy_1_percent    sp_copy_1_percent              1.0\n",
+      "sp_copy_5_percent    sp_copy_5_percent              5.0\n",
+      "sp_copy_10_percent  sp_copy_10_percent             10.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(df_saleprice))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 65,
+   "id": "8351dbe2-b388-451d-9238-52c4ccabd425",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                           column_name  percent_missing\n",
+      "SalePrice                    SalePrice              0.0\n",
+      "sp_copy_1_percent    sp_copy_1_percent              1.0\n",
+      "sp_copy_5_percent    sp_copy_5_percent              5.0\n",
+      "sp_copy_10_percent  sp_copy_10_percent             10.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(df_saleprice_scaled))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 66,
+   "id": "b11b093f-110b-4ef3-9d00-ac4fed45a956",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "10"
+      ]
+     },
+     "execution_count": 66,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_saleprice['sp_copy_1_percent'].isna().sum()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "360e0010-e085-435c-8902-80c6a7ea78be",
+   "metadata": {},
+   "source": [
+    "#### Store indices of missing values"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 67,
+   "id": "e546096c-ce35-448e-aa97-0943d3535a87",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Store Index of NaN values in each coloumns\n",
+    "sp_1_idx = list(np.where(df_saleprice['sp_copy_1_percent'].isna())[0])\n",
+    "sp_5_idx = list(np.where(df_saleprice['sp_copy_5_percent'].isna())[0])\n",
+    "sp_10_idx = list(np.where(df_saleprice['sp_copy_10_percent'].isna())[0])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "id": "d409e2a5-b3a9-4ae1-9b17-88b7c642692d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10\n",
+      "50\n",
+      "100\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(len(sp_1_idx))\n",
+    "print(len(sp_5_idx))\n",
+    "print(len(sp_10_idx))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "id": "5839460a-e736-42e9-9a13-d5bab5683115",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Length of sp_1_idx is 10 and it contains 1.0% of total data in column | Total rows: 1000\n",
+      "Length of sp_5_idx is 50 and it contains 5.0% of total data in column | Total rows: 1000\n",
+      "Length of sp_10_idx is 100 and it contains 10.0% of total data in column | Total rows: 1000\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(f\"Length of sp_1_idx is {len(sp_1_idx)} and it contains {(len(sp_1_idx)/len(df_saleprice['sp_copy_1_percent']))*100}% of total data in column | Total rows: {len(df_saleprice['sp_copy_1_percent'])}\")\n",
+    "print(f\"Length of sp_5_idx is {len(sp_5_idx)} and it contains {(len(sp_5_idx)/len(df_saleprice['sp_copy_5_percent']))*100}% of total data in column | Total rows: {len(df_saleprice['sp_copy_1_percent'])}\")\n",
+    "print(f\"Length of sp_10_idx is {len(sp_10_idx)} and it contains {(len(sp_10_idx)/len(df_saleprice['sp_copy_10_percent']))*100}% of total data in column | Total rows: {len(df_saleprice['sp_copy_1_percent'])}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1464c79-c0a9-4640-92dd-f0d5131634ab",
+   "metadata": {},
+   "source": [
+    "### Perform KNN to df_saleprice and df_saleprice_scaled dataframe"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 70,
+   "id": "08fa2436-ffb8-4b5d-a7a1-9e2d63b14562",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_saleprice1 = df_saleprice.copy(deep=True)\n",
+    "imputer = KNNImputer(n_neighbors=5)\n",
+    "imputed_saleprice_df = pd.DataFrame(imputer.fit_transform(df_saleprice1), columns = df_saleprice1.columns)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 71,
+   "id": "205c7a96-3f1c-42a4-91de-f22f15ce9cb2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_saleprice_scaled1 = df_saleprice_scaled.copy(deep=True)\n",
+    "imputer = KNNImputer(n_neighbors=5)\n",
+    "imputed_saleprice_scaled_df = pd.DataFrame(imputer.fit_transform(df_saleprice_scaled1), columns = df_saleprice_scaled1.columns)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 72,
+   "id": "a482f58d-73b6-423c-b97a-140884830a0f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>SalePrice</th>\n",
+       "      <th>sp_copy_1_percent</th>\n",
+       "      <th>sp_copy_5_percent</th>\n",
+       "      <th>sp_copy_10_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>208500.0</td>\n",
+       "      <td>208500.0</td>\n",
+       "      <td>208500.0</td>\n",
+       "      <td>208500.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>181500.0</td>\n",
+       "      <td>181500.0</td>\n",
+       "      <td>181500.0</td>\n",
+       "      <td>181500.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>223500.0</td>\n",
+       "      <td>223500.0</td>\n",
+       "      <td>223500.0</td>\n",
+       "      <td>223500.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>140000.0</td>\n",
+       "      <td>140000.0</td>\n",
+       "      <td>140000.0</td>\n",
+       "      <td>140000.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>250000.0</td>\n",
+       "      <td>250000.0</td>\n",
+       "      <td>250000.0</td>\n",
+       "      <td>250000.0</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   SalePrice  sp_copy_1_percent  sp_copy_5_percent  sp_copy_10_percent\n",
+       "0   208500.0           208500.0           208500.0            208500.0\n",
+       "1   181500.0           181500.0           181500.0            181500.0\n",
+       "2   223500.0           223500.0           223500.0            223500.0\n",
+       "3   140000.0           140000.0           140000.0            140000.0\n",
+       "4   250000.0           250000.0           250000.0            250000.0"
+      ]
+     },
+     "execution_count": 72,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "imputed_saleprice_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 73,
+   "id": "11f8f5ff-f06d-4ec2-a4e3-1324e807a537",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>SalePrice</th>\n",
+       "      <th>sp_copy_1_percent</th>\n",
+       "      <th>sp_copy_5_percent</th>\n",
+       "      <th>sp_copy_10_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0.241078</td>\n",
+       "      <td>0.241078</td>\n",
+       "      <td>0.241078</td>\n",
+       "      <td>0.241078</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>0.203583</td>\n",
+       "      <td>0.203583</td>\n",
+       "      <td>0.203583</td>\n",
+       "      <td>0.203583</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>0.261908</td>\n",
+       "      <td>0.261908</td>\n",
+       "      <td>0.261908</td>\n",
+       "      <td>0.261908</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>0.145952</td>\n",
+       "      <td>0.145952</td>\n",
+       "      <td>0.145952</td>\n",
+       "      <td>0.145952</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>0.298709</td>\n",
+       "      <td>0.298709</td>\n",
+       "      <td>0.298709</td>\n",
+       "      <td>0.298709</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   SalePrice  sp_copy_1_percent  sp_copy_5_percent  sp_copy_10_percent\n",
+       "0   0.241078           0.241078           0.241078            0.241078\n",
+       "1   0.203583           0.203583           0.203583            0.203583\n",
+       "2   0.261908           0.261908           0.261908            0.261908\n",
+       "3   0.145952           0.145952           0.145952            0.145952\n",
+       "4   0.298709           0.298709           0.298709            0.298709"
+      ]
+     },
+     "execution_count": 73,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "imputed_saleprice_scaled_df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9fd7fa1-4ce0-43be-9955-55ef759d930b",
+   "metadata": {},
+   "source": [
+    "#### Check % missing in saleprice and saleprice_scaled DF"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 74,
+   "id": "9ed0d36a-9584-4e3b-9201-2ac36827bce9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                           column_name  percent_missing\n",
+      "SalePrice                    SalePrice              0.0\n",
+      "sp_copy_1_percent    sp_copy_1_percent              0.0\n",
+      "sp_copy_5_percent    sp_copy_5_percent              0.0\n",
+      "sp_copy_10_percent  sp_copy_10_percent              0.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(imputed_saleprice_df))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 75,
+   "id": "7c842fce-bbd5-4c2c-bb1a-db5df92f6315",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                           column_name  percent_missing\n",
+      "SalePrice                    SalePrice              0.0\n",
+      "sp_copy_1_percent    sp_copy_1_percent              0.0\n",
+      "sp_copy_5_percent    sp_copy_5_percent              0.0\n",
+      "sp_copy_10_percent  sp_copy_10_percent              0.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(imputed_saleprice_scaled_df))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac47abb1-df5f-4686-bc67-6617140c008c",
+   "metadata": {},
+   "source": [
+    "#### Store the list of disfferences between Org. and Imputed Value"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 76,
+   "id": "99e04554-568d-4efa-a110-768b50dfaee6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create list of difference bwtween imputed and orginal value\n",
+    "\n",
+    "sp_diff_1 = []\n",
+    "sp_diff_5 = []\n",
+    "sp_diff_10 = []\n",
+    "count = 0\n",
+    "\n",
+    "for i in sp_1_idx:\n",
+    "    count +=1\n",
+    "    diff1 = abs(imputed_saleprice_df['sp_copy_1_percent'][i] - imputed_saleprice_df['SalePrice'][i])\n",
+    "    sp_diff_1.append(diff1)\n",
+    "    \n",
+    "\n",
+    "for i in sp_5_idx:\n",
+    "    diff5 = abs(imputed_saleprice_df['sp_copy_5_percent'][i] - imputed_saleprice_df['SalePrice'][i])\n",
+    "    sp_diff_5.append(diff5)\n",
+    "\n",
+    "for i in sp_10_idx:\n",
+    "    diff10 = abs(imputed_saleprice_df['sp_copy_10_percent'][i] - imputed_saleprice_df['SalePrice'][i])\n",
+    "    sp_diff_10.append(diff10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 77,
+   "id": "92204f8a-497c-470d-a770-59165d226cc9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10\n",
+      "50\n",
+      "100\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(len(sp_diff_1))\n",
+    "print(len(sp_diff_5))\n",
+    "print(len(sp_diff_10))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 78,
+   "id": "b8875fff-0289-4dd9-92c1-78dc9b730d22",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create list of difference bwtween imputed and orginal value\n",
+    "\n",
+    "sp_scaled_diff_1 = []\n",
+    "sp_scaled_diff_5 = []\n",
+    "sp_scaled_diff_10 = []\n",
+    "count = 0\n",
+    "\n",
+    "for i in sp_1_idx:\n",
+    "    count +=1\n",
+    "    diff1 = abs(imputed_saleprice_scaled_df['sp_copy_1_percent'][i] - imputed_saleprice_scaled_df['SalePrice'][i])\n",
+    "    sp_scaled_diff_1.append(diff1)\n",
+    "    \n",
+    "\n",
+    "for i in sp_5_idx:\n",
+    "    diff5 = abs(imputed_saleprice_scaled_df['sp_copy_5_percent'][i] - imputed_saleprice_scaled_df['SalePrice'][i])\n",
+    "    sp_scaled_diff_5.append(diff5)\n",
+    "\n",
+    "for i in sp_10_idx:\n",
+    "    diff10 = abs(imputed_saleprice_scaled_df['sp_copy_10_percent'][i] - imputed_saleprice_scaled_df['SalePrice'][i])\n",
+    "    sp_scaled_diff_10.append(diff10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 79,
+   "id": "40192344-79a4-444c-a12a-2201dc5aa0c1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10\n",
+      "50\n",
+      "100\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(len(sp_scaled_diff_1))\n",
+    "print(len(sp_scaled_diff_5))\n",
+    "print(len(sp_scaled_diff_10))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 80,
+   "id": "a95bd45c-8a2f-4159-8306-399ec18a4c0f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[0.0, 0.0, 0.0, 0.0, 0.0]"
+      ]
+     },
+     "execution_count": 80,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sp_scaled_diff_1[:5]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 81,
+   "id": "0f73d420-8842-4062-ae17-158a0a25e169",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[0.0, 100.0, 20.0, 0.0, 780.0]"
+      ]
+     },
+     "execution_count": 81,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sp_diff_1[:5]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a40fd400-913b-4011-b0b9-dd3ca0d5827a",
+   "metadata": {},
+   "source": [
+    "#### Calculate the mean and var of list of diff. KNN - SalePrice"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 82,
+   "id": "80267827-7f73-49ff-b200-27cdb2963756",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The mean of 1% is 105.0 and varience 1% is 52105.0\n",
+      "The mean of 5% is 163.0120000000001 and varience 5% is 46018.96385599976\n",
+      "The mean of 10% is 163.0120000000001 and varience 10% is 3667553.3671999993\n"
+     ]
+    }
+   ],
+   "source": [
+    "m1 = sum(sp_diff_1) / len(sp_diff_1)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res1 = sum((xi - m1) ** 2 for xi in sp_diff_1) / len(sp_diff_1)\n",
+    "\n",
+    "m5 = sum(sp_diff_5) / len(sp_diff_5)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res5 = sum((xii - m5) ** 2 for xii in sp_diff_5) / len(sp_diff_5)\n",
+    "\n",
+    "\n",
+    "m10 = sum(sp_diff_10) / len(sp_diff_10)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res10 = sum((xiii - m10) ** 2 for xiii in sp_diff_10) / len(sp_diff_10)\n",
+    "\n",
+    "print(f\"The mean of 1% is {m1} and varience 1% is {var_res1}\")\n",
+    "print(f\"The mean of 5% is {m5} and varience 5% is {var_res5}\")\n",
+    "print(f\"The mean of 10% is {m5} and varience 10% is {var_res10}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 83,
+   "id": "358545ff-2fcf-4c99-9049-4eaf6dd110bd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_knn_saleprice = pd.DataFrame.from_dict({'1%_saleprice': [m1, var_res1],\n",
+    " '5%_saleprice': [m5, var_res5],\n",
+    " '10%_saleprice': [m10, var_res10]}, orient='index')\n",
+    "df_knn_saleprice.columns=['diff. list Mean(KNN)', 'diff. list Var.(KNN)']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 84,
+   "id": "3714c8f9-58db-40a7-b5a2-6bb7e788b734",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>diff. list Mean(KNN)</th>\n",
+       "      <th>diff. list Var.(KNN)</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1%_saleprice</th>\n",
+       "      <td>105.000</td>\n",
+       "      <td>5.210500e+04</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5%_saleprice</th>\n",
+       "      <td>163.012</td>\n",
+       "      <td>4.601896e+04</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10%_saleprice</th>\n",
+       "      <td>470.800</td>\n",
+       "      <td>3.667553e+06</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "               diff. list Mean(KNN)  diff. list Var.(KNN)\n",
+       "1%_saleprice                105.000          5.210500e+04\n",
+       "5%_saleprice                163.012          4.601896e+04\n",
+       "10%_saleprice               470.800          3.667553e+06"
+      ]
+     },
+     "execution_count": 84,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_knn_saleprice"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fd7608a8-c5fb-425c-a340-af01801ee349",
+   "metadata": {},
+   "source": [
+    "#### Calculate the mean and var of list of diff. KNN - SalePrice scaled"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 85,
+   "id": "bb03017f-3d91-48d9-8ebf-7cb5c25fadc3",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The mean of 1% is 0.0 and varience 1% is 0.0\n",
+      "The mean of 5% is 1.2498264129982007e-05 and varience 5% is 7.654123706876951e-09\n",
+      "The mean of 10% is 1.2498264129982007e-05 and varience 10% is 2.9738417673284677e-06\n"
+     ]
+    }
+   ],
+   "source": [
+    "m1 = sum(sp_scaled_diff_1) / len(sp_scaled_diff_1)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res1 = sum((xi - m1) ** 2 for xi in sp_scaled_diff_1) / len(sp_scaled_diff_1)\n",
+    "\n",
+    "m5 = sum(sp_scaled_diff_5) / len(sp_scaled_diff_5)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res5 = sum((xii - m5) ** 2 for xii in sp_scaled_diff_5) / len(sp_scaled_diff_5)\n",
+    "\n",
+    "\n",
+    "m10 = sum(sp_scaled_diff_10) / len(sp_scaled_diff_10)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res10 = sum((xiii - m10) ** 2 for xiii in sp_scaled_diff_10) / len(sp_scaled_diff_10)\n",
+    "\n",
+    "print(f\"The mean of 1% is {m1} and varience 1% is {var_res1}\")\n",
+    "print(f\"The mean of 5% is {m5} and varience 5% is {var_res5}\")\n",
+    "print(f\"The mean of 10% is {m5} and varience 10% is {var_res10}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 86,
+   "id": "290d8db2-c9f4-4028-ab44-ad68c9e7b3c5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_knn_saleprice_scaled = pd.DataFrame.from_dict({'1%_saleprice': [m1, var_res1],\n",
+    " '5%_saleprice': [m5, var_res5],\n",
+    " '10%_saleprice': [m10, var_res10]}, orient='index')\n",
+    "df_knn_saleprice_scaled.columns=['diff. list Mean(KNN) scaled', 'diff. list Var.(KNN) scaled']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 87,
+   "id": "89347fd7-d87d-42bb-b375-a75417c395de",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>diff. list Mean(KNN) scaled</th>\n",
+       "      <th>diff. list Var.(KNN) scaled</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1%_saleprice</th>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0.000000e+00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5%_saleprice</th>\n",
+       "      <td>0.000012</td>\n",
+       "      <td>7.654124e-09</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10%_saleprice</th>\n",
+       "      <td>0.000265</td>\n",
+       "      <td>2.973842e-06</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "               diff. list Mean(KNN) scaled  diff. list Var.(KNN) scaled\n",
+       "1%_saleprice                      0.000000                 0.000000e+00\n",
+       "5%_saleprice                      0.000012                 7.654124e-09\n",
+       "10%_saleprice                     0.000265                 2.973842e-06"
+      ]
+     },
+     "execution_count": 87,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_knn_saleprice_scaled"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c984dc69-f85f-4f1b-8c94-4afb48c1c8db",
+   "metadata": {},
+   "source": [
+    "### Perform MEAN imputation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 88,
+   "id": "008bc14f-45e7-42d8-b843-2fee7bcf26c2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_saleprice2 = df_saleprice.copy(deep=True)\n",
+    "df_saleprice_scaled2 = df_saleprice_scaled.copy(deep=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 89,
+   "id": "bd71dc1a-f137-46ed-bf2b-f3d87fd4b6a0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                           column_name  percent_missing\n",
+      "SalePrice                    SalePrice              0.0\n",
+      "sp_copy_1_percent    sp_copy_1_percent              1.0\n",
+      "sp_copy_5_percent    sp_copy_5_percent              5.0\n",
+      "sp_copy_10_percent  sp_copy_10_percent             10.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(df_saleprice2))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 90,
+   "id": "46237cfd-6361-466f-b66f-32f5940149d6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                           column_name  percent_missing\n",
+      "SalePrice                    SalePrice              0.0\n",
+      "sp_copy_1_percent    sp_copy_1_percent              1.0\n",
+      "sp_copy_5_percent    sp_copy_5_percent              5.0\n",
+      "sp_copy_10_percent  sp_copy_10_percent             10.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(df_saleprice_scaled2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64465299-5620-47b9-a28d-afb5494f279e",
+   "metadata": {},
+   "source": [
+    "#### Impute Mean values in missing for saleprice and saleprice_scaled"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 91,
+   "id": "28cf6b75-eebf-4758-94ec-4b3536f2c659",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_saleprice2['sp_copy_1_percent'] = df_saleprice2['sp_copy_1_percent'].fillna(df_saleprice2['sp_copy_1_percent'].mean())\n",
+    "df_saleprice2['sp_copy_5_percent'] = df_saleprice2['sp_copy_5_percent'].fillna(df_saleprice2['sp_copy_5_percent'].mean())\n",
+    "df_saleprice2['sp_copy_10_percent'] = df_saleprice2['sp_copy_10_percent'].fillna(df_saleprice2['sp_copy_10_percent'].mean())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 92,
+   "id": "2409dd8c-3cd0-4742-b0ac-14dea1fdb504",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_saleprice_scaled2['sp_copy_1_percent'] = df_saleprice_scaled2['sp_copy_1_percent'].fillna(df_saleprice_scaled2['sp_copy_1_percent'].mean())\n",
+    "df_saleprice_scaled2['sp_copy_5_percent'] = df_saleprice_scaled2['sp_copy_5_percent'].fillna(df_saleprice_scaled2['sp_copy_5_percent'].mean())\n",
+    "df_saleprice_scaled2['sp_copy_10_percent'] = df_saleprice_scaled2['sp_copy_10_percent'].fillna(df_saleprice_scaled2['sp_copy_10_percent'].mean())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62377754-b682-45e5-8faa-1a4a186bd3c7",
+   "metadata": {},
+   "source": [
+    "#### After MEAN imputation - Saleprice and saleprice scaled"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 93,
+   "id": "6c448556-55f4-4685-aed2-6b67d5ad8a2a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                           column_name  percent_missing\n",
+      "SalePrice                    SalePrice              0.0\n",
+      "sp_copy_1_percent    sp_copy_1_percent              0.0\n",
+      "sp_copy_5_percent    sp_copy_5_percent              0.0\n",
+      "sp_copy_10_percent  sp_copy_10_percent              0.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(df_saleprice2))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 94,
+   "id": "d9775fbf-7a72-4352-b446-488e9d25b6a2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                           column_name  percent_missing\n",
+      "SalePrice                    SalePrice              0.0\n",
+      "sp_copy_1_percent    sp_copy_1_percent              0.0\n",
+      "sp_copy_5_percent    sp_copy_5_percent              0.0\n",
+      "sp_copy_10_percent  sp_copy_10_percent              0.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(get_percent_missing(df_saleprice_scaled2))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 95,
+   "id": "136f87e6-a4af-4229-b36a-695f712deee5",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>SalePrice</th>\n",
+       "      <th>sp_copy_1_percent</th>\n",
+       "      <th>sp_copy_5_percent</th>\n",
+       "      <th>sp_copy_10_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>436</th>\n",
+       "      <td>116000</td>\n",
+       "      <td>116000.0</td>\n",
+       "      <td>116000.0</td>\n",
+       "      <td>116000.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>21</th>\n",
+       "      <td>139400</td>\n",
+       "      <td>139400.0</td>\n",
+       "      <td>139400.0</td>\n",
+       "      <td>139400.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>618</th>\n",
+       "      <td>314813</td>\n",
+       "      <td>314813.0</td>\n",
+       "      <td>314813.0</td>\n",
+       "      <td>314813.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>207</th>\n",
+       "      <td>141000</td>\n",
+       "      <td>141000.0</td>\n",
+       "      <td>141000.0</td>\n",
+       "      <td>182369.783333</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>366</th>\n",
+       "      <td>159000</td>\n",
+       "      <td>159000.0</td>\n",
+       "      <td>159000.0</td>\n",
+       "      <td>159000.000000</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "     SalePrice  sp_copy_1_percent  sp_copy_5_percent  sp_copy_10_percent\n",
+       "436     116000           116000.0           116000.0       116000.000000\n",
+       "21      139400           139400.0           139400.0       139400.000000\n",
+       "618     314813           314813.0           314813.0       314813.000000\n",
+       "207     141000           141000.0           141000.0       182369.783333\n",
+       "366     159000           159000.0           159000.0       159000.000000"
+      ]
+     },
+     "execution_count": 95,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_saleprice2.sample(5)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 96,
+   "id": "784cb61c-78f8-4b31-b709-379c50024dca",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>SalePrice</th>\n",
+       "      <th>sp_copy_1_percent</th>\n",
+       "      <th>sp_copy_5_percent</th>\n",
+       "      <th>sp_copy_10_percent</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>457</th>\n",
+       "      <td>0.307041</td>\n",
+       "      <td>0.307041</td>\n",
+       "      <td>0.307041</td>\n",
+       "      <td>0.201890</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>876</th>\n",
+       "      <td>0.135190</td>\n",
+       "      <td>0.135190</td>\n",
+       "      <td>0.135190</td>\n",
+       "      <td>0.135190</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>361</th>\n",
+       "      <td>0.152895</td>\n",
+       "      <td>0.152895</td>\n",
+       "      <td>0.152895</td>\n",
+       "      <td>0.152895</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>682</th>\n",
+       "      <td>0.191779</td>\n",
+       "      <td>0.191779</td>\n",
+       "      <td>0.191779</td>\n",
+       "      <td>0.201890</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>523</th>\n",
+       "      <td>0.208096</td>\n",
+       "      <td>0.208096</td>\n",
+       "      <td>0.208096</td>\n",
+       "      <td>0.208096</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "     SalePrice  sp_copy_1_percent  sp_copy_5_percent  sp_copy_10_percent\n",
+       "457   0.307041           0.307041           0.307041            0.201890\n",
+       "876   0.135190           0.135190           0.135190            0.135190\n",
+       "361   0.152895           0.152895           0.152895            0.152895\n",
+       "682   0.191779           0.191779           0.191779            0.201890\n",
+       "523   0.208096           0.208096           0.208096            0.208096"
+      ]
+     },
+     "execution_count": 96,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_saleprice_scaled2.sample(5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33c1f3b7-5afc-45cb-8b43-9682ec87156d",
+   "metadata": {},
+   "source": [
+    "#### Create List of differences for saleprice and saleprice_scaled Dataframes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 97,
+   "id": "d2faf410-f83e-4ccb-89d4-e6f8c7adffbb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create list of difference bwtween imputed and orginal value\n",
+    "\n",
+    "sp_mean_diff_1 = []\n",
+    "sp_mean_diff_5 = []\n",
+    "sp_mean_diff_10 = []\n",
+    "count = 0\n",
+    "\n",
+    "for i in sp_1_idx:\n",
+    "    count +=1\n",
+    "    diff1 = abs(df_saleprice2['sp_copy_1_percent'][i] - df_saleprice2['SalePrice'][i])\n",
+    "    sp_mean_diff_1.append(diff1)\n",
+    "    \n",
+    "\n",
+    "for i in sp_5_idx:\n",
+    "    diff5 = abs(df_saleprice2['sp_copy_5_percent'][i] - df_saleprice2['SalePrice'][i])\n",
+    "    sp_mean_diff_5.append(diff5)\n",
+    "\n",
+    "for i in sp_10_idx:\n",
+    "    diff10 = abs(df_saleprice2['sp_copy_10_percent'][i] - df_saleprice2['SalePrice'][i])\n",
+    "    sp_mean_diff_10.append(diff10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 98,
+   "id": "789b07c5-530a-4111-8c97-f5297f7da5e4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10\n",
+      "50\n",
+      "100\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(len(sp_mean_diff_1))\n",
+    "print(len(sp_mean_diff_5))\n",
+    "print(len(sp_mean_diff_10))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 99,
+   "id": "4fec222c-2420-41af-9e2a-d9773e1d6259",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create list of difference bwtween imputed and orginal value\n",
+    "\n",
+    "sp_scaled_mean_diff_1 = []\n",
+    "sp_scaled_mean_diff_5 = []\n",
+    "sp_scaled_mean_diff_10 = []\n",
+    "count = 0\n",
+    "\n",
+    "for i in sp_1_idx:\n",
+    "    count +=1\n",
+    "    diff1 = abs(df_saleprice_scaled2['sp_copy_1_percent'][i] - df_saleprice_scaled2['SalePrice'][i])\n",
+    "    sp_scaled_mean_diff_1.append(diff1)\n",
+    "    \n",
+    "\n",
+    "for i in sp_5_idx:\n",
+    "    diff5 = abs(df_saleprice_scaled2['sp_copy_5_percent'][i] - df_saleprice_scaled2['SalePrice'][i])\n",
+    "    sp_scaled_mean_diff_5.append(diff5)\n",
+    "\n",
+    "for i in sp_10_idx:\n",
+    "    diff10 = abs(df_saleprice_scaled2['sp_copy_10_percent'][i] - df_saleprice_scaled2['SalePrice'][i])\n",
+    "    sp_scaled_mean_diff_10.append(diff10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 100,
+   "id": "de9bf1de-68fe-4894-915a-7069b386123f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10\n",
+      "50\n",
+      "100\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(len(sp_scaled_mean_diff_1))\n",
+    "print(len(sp_scaled_mean_diff_5))\n",
+    "print(len(sp_scaled_mean_diff_10))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7b93757-d1a7-41a1-85fa-3ee77734be5b",
+   "metadata": {},
+   "source": [
+    "#### Calculate mean and var of list of diff. - MEAN impute SalePrice"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 101,
+   "id": "c60d3aad-33f0-48f4-8bb0-f8af45e33e1e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The mean of 1% is 47198.61696969698 and varience 1% is 634546571.3543438\n",
+      "The mean of 5% is 54438.20686315788 and varience 5% is 1768876209.3358026\n",
+      "The mean of 10% is 54438.20686315788 and varience 10% is 2875290913.3009353\n"
+     ]
+    }
+   ],
+   "source": [
+    "m1 = sum(sp_mean_diff_1) / len(sp_mean_diff_1)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res1 = sum((xi - m1) ** 2 for xi in sp_mean_diff_1) / len(sp_mean_diff_1)\n",
+    "\n",
+    "m5 = sum(sp_mean_diff_5) / len(sp_mean_diff_5)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res5 = sum((xii - m5) ** 2 for xii in sp_mean_diff_5) / len(sp_mean_diff_5)\n",
+    "\n",
+    "\n",
+    "m10 = sum(sp_mean_diff_10) / len(sp_mean_diff_10)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res10 = sum((xiii - m10) ** 2 for xiii in sp_mean_diff_10) / len(sp_mean_diff_10)\n",
+    "\n",
+    "print(f\"The mean of 1% is {m1} and varience 1% is {var_res1}\")\n",
+    "print(f\"The mean of 5% is {m5} and varience 5% is {var_res5}\")\n",
+    "print(f\"The mean of 10% is {m5} and varience 10% is {var_res10}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 102,
+   "id": "e7f6e5cf-4eaa-4bfe-add2-fc7f600941b7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_mean_saleprice = pd.DataFrame.from_dict({'1%_saleprice': [m1, var_res1],\n",
+    " '5%_saleprice': [m5, var_res5],\n",
+    " '10%_saleprice': [m10, var_res10]}, orient='index')\n",
+    "df_mean_saleprice.columns=['diff. list Mean(MI)', 'diff. list Var.(MI)']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 103,
+   "id": "cc37eeaf-e3cd-4a83-870d-fab7037eeffe",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>diff. list Mean(MI)</th>\n",
+       "      <th>diff. list Var.(MI)</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1%_saleprice</th>\n",
+       "      <td>47198.616970</td>\n",
+       "      <td>6.345466e+08</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5%_saleprice</th>\n",
+       "      <td>54438.206863</td>\n",
+       "      <td>1.768876e+09</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10%_saleprice</th>\n",
+       "      <td>58045.636667</td>\n",
+       "      <td>2.875291e+09</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "               diff. list Mean(MI)  diff. list Var.(MI)\n",
+       "1%_saleprice          47198.616970         6.345466e+08\n",
+       "5%_saleprice          54438.206863         1.768876e+09\n",
+       "10%_saleprice         58045.636667         2.875291e+09"
+      ]
+     },
+     "execution_count": 103,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_mean_saleprice"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f405f073-1b45-47e8-873b-7a9d34ad0e5c",
+   "metadata": {},
+   "source": [
+    "#### Calculate mean and var of list of diff. - MEAN impute SalePrice scaled"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 104,
+   "id": "2516b4f7-6b79-4636-9bd5-0738343ea355",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The mean of 1% is 0.0 and varience 1% is 0.0\n",
+      "The mean of 5% is 0.0016175777048509216 and varience 5% is 5.557201947380946e-05\n",
+      "The mean of 10% is 0.0016175777048509216 and varience 10% is 0.004250732648521598\n"
+     ]
+    }
+   ],
+   "source": [
+    "m1 = sum(sp_scaled_mean_diff_1) / len(sp_scaled_mean_diff_1)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res1 = sum((xi - m1) ** 2 for xi in sp_scaled_mean_diff_1) / len(sp_scaled_mean_diff_1)\n",
+    "\n",
+    "m5 = sum(sp_scaled_mean_diff_5) / len(sp_scaled_mean_diff_5)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res5 = sum((xii - m5) ** 2 for xii in sp_scaled_mean_diff_5) / len(sp_scaled_mean_diff_5)\n",
+    "\n",
+    "\n",
+    "m10 = sum(sp_scaled_mean_diff_10) / len(sp_scaled_mean_diff_10)\n",
+    "\n",
+    "# calculate variance using a list comprehension\n",
+    "var_res10 = sum((xiii - m10) ** 2 for xiii in sp_scaled_mean_diff_10) / len(sp_scaled_mean_diff_10)\n",
+    "\n",
+    "print(f\"The mean of 1% is {m1} and varience 1% is {var_res1}\")\n",
+    "print(f\"The mean of 5% is {m5} and varience 5% is {var_res5}\")\n",
+    "print(f\"The mean of 10% is {m5} and varience 10% is {var_res10}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 105,
+   "id": "fe6a93b8-d6cb-4d7d-856b-ab4ee8fe78fc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_mean_saleprice_scaled = pd.DataFrame.from_dict({'1%_saleprice_scaled': [m1, var_res1],\n",
+    " '5%_saleprice_scaled': [m5, var_res5],\n",
+    " '10%_saleprice_scaled': [m10, var_res10]}, orient='index')\n",
+    "df_mean_saleprice_scaled.columns=['diff. list Mean(MI) scaled', 'diff. list Var.(MI) scaled']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 106,
+   "id": "e74c35ed-7c2d-44ab-b6c2-4d81c2c6b6bb",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>diff. list Mean(MI) scaled</th>\n",
+       "      <th>diff. list Var.(MI) scaled</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1%_saleprice_scaled</th>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5%_saleprice_scaled</th>\n",
+       "      <td>0.001618</td>\n",
+       "      <td>0.000056</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10%_saleprice_scaled</th>\n",
+       "      <td>0.018922</td>\n",
+       "      <td>0.004251</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                      diff. list Mean(MI) scaled  diff. list Var.(MI) scaled\n",
+       "1%_saleprice_scaled                     0.000000                    0.000000\n",
+       "5%_saleprice_scaled                     0.001618                    0.000056\n",
+       "10%_saleprice_scaled                    0.018922                    0.004251"
+      ]
+     },
+     "execution_count": 106,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_mean_saleprice_scaled"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "876b979a-f5c4-43a7-9ead-d5d866bef078",
+   "metadata": {},
+   "source": [
+    "# 2.2 Housing Data Results - KNN and MEAN"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 107,
+   "id": "e90e9486-280d-4e96-b16a-0c3314eaedc9",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<table><tr style=\"background-color:white;\"><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>diff. list Mean(KNN)</th>\n",
+       "      <th>diff. list Var.(KNN)</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1%_saleprice</th>\n",
+       "      <td>105.000</td>\n",
+       "      <td>5.210500e+04</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5%_saleprice</th>\n",
+       "      <td>163.012</td>\n",
+       "      <td>4.601896e+04</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10%_saleprice</th>\n",
+       "      <td>470.800</td>\n",
+       "      <td>3.667553e+06</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>diff. list Mean(KNN) scaled</th>\n",
+       "      <th>diff. list Var.(KNN) scaled</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1%_saleprice</th>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0.000000e+00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5%_saleprice</th>\n",
+       "      <td>0.000012</td>\n",
+       "      <td>7.654124e-09</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10%_saleprice</th>\n",
+       "      <td>0.000265</td>\n",
+       "      <td>2.973842e-06</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>diff. list Mean(MI)</th>\n",
+       "      <th>diff. list Var.(MI)</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1%_saleprice</th>\n",
+       "      <td>47198.616970</td>\n",
+       "      <td>6.345466e+08</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5%_saleprice</th>\n",
+       "      <td>54438.206863</td>\n",
+       "      <td>1.768876e+09</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10%_saleprice</th>\n",
+       "      <td>58045.636667</td>\n",
+       "      <td>2.875291e+09</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td><td><div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>diff. list Mean(MI) scaled</th>\n",
+       "      <th>diff. list Var.(MI) scaled</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1%_saleprice_scaled</th>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5%_saleprice_scaled</th>\n",
+       "      <td>0.001618</td>\n",
+       "      <td>0.000056</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10%_saleprice_scaled</th>\n",
+       "      <td>0.018922</td>\n",
+       "      <td>0.004251</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div></td></tr></table>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "execution_count": 107,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "multi_table([df_knn_saleprice, df_knn_saleprice_scaled, df_mean_saleprice, df_mean_saleprice_scaled])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e07a4e01-7e4e-4bdb-b6c7-ef2424fc6a80",
+   "metadata": {},
+   "source": [
+    "Result: Another takeaway here is that if we use scaling before performing the imputation, the imputation works much better and accuratly. Although the mean imputation provided less accurate results as compared to the KNN imputation, but the accuracy of the imputed values are still better if we use scaling than not using it. KNN imputation on other hand did perform better than mean imputation, however the results are much better if we use scaled dataset."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "977be574-18b2-4f80-a019-2a86227a14d6",
+   "metadata": {},
+   "source": [
+    "# Conclusion\n",
+    "1. KNN imputation is performing better than mean imputation\n",
+    "2. If we use scaled dataset as compared to non scaled dataset, the results are even better (almost close to perfect!)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "764c9bdb-78dc-4287-a527-0e14ff58a5e9",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/notebooks/Imputation_best_practices/readme.md b/notebooks/Imputation_best_practices/readme.md
new file mode 100644
index 0000000..b4d57c1
--- /dev/null
+++ b/notebooks/Imputation_best_practices/readme.md
@@ -0,0 +1 @@
+This folder will contain the notebook and the data used for demonstrating how to effectively use imputation practices using KNN and mean imputations

	number
823	0.925249
266	0.077479
959	0.897447
493	0.259423
768	0.193178
105	0.174632
610	0.456349
824	0.688290
968	0.493667
849	0.368834
	number	number_copy_1_percent	number_copy_5_percent	number_copy_10_percent
0	0.438564	0.438564	0.438564	0.438564
1	0.836801	0.836801	0.836801	0.836801
2	0.798077	0.798077	0.798077	0.798077
3	0.269161	0.269161	0.269161	0.269161
4	0.830948	0.830948	0.830948	0.830948
...	...	...	...	...
995	0.920130	0.920130	0.920130	0.920130
996	0.007397	0.007397	0.007397	0.007397
997	0.163360	0.163360	0.163360	0.163360
998	0.553700	0.553700	0.553700	0.553700
999	0.771442	0.771442	0.771442	0.771442
	number	number_copy_1_percent	number_copy_5_percent	number_copy_10_percent
701	0.244629	0.244629	0.244629	0.244629
39	0.517202	0.517202	0.517202	0.517202
335	0.100813	0.100813	0.100813	0.100813
204	0.277534	0.277534	0.277534	0.277534
391	0.859032	0.859032	0.857231	0.859032
203	0.252622	0.252622	0.252622	0.252622
144	0.844587	0.844587	0.844587	0.844587
201	0.431603	0.431603	0.431603	0.431603
749	0.848537	0.848537	0.848537	0.848240
497	0.464531	0.464531	0.464531	0.464531
	number	number_copy_1_percent	number_copy_5_percent	number_copy_10_percent
293	0.583231	0.583231	0.583231	0.583231
461	0.867035	0.867035	0.867035	0.867035
875	0.676228	0.676228	0.676228	0.676228
999	0.771442	0.771442	0.771442	0.771442
75	0.909050	0.909050	0.909050	0.909050
98	0.629583	0.629583	0.629583	0.629583
381	0.181614	0.181614	0.181614	0.181614
592	0.523109	0.523109	0.523109	0.523109
155	0.038074	0.038074	0.038074	0.038074
630	0.869200	0.869200	0.869200	0.869200
	Id	MSSubClass	MSZoning	LotFrontage	LotArea	Street	Alley	LotShape	LandContour	Utilities	...	PoolQC	Fence	MiscFeature	MoSold	YrSold	SaleType	SaleCondition	SalePrice
740	741	70	RM	60.0	9600	Pave	Grvl	Reg	Lvl	AllPub	...	NaN	GdPrv	NaN	5	2007	WD	Abnorml	132000
1209	1210	20	RL	85.0	10182	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	5	2006	New	Partial	290000
64	65	60	RL	NaN	9375	Pave	NaN	Reg	Lvl	AllPub	...	NaN	GdPrv	NaN	2	2009	WD	Normal	219500
208	209	60	RL	NaN	14364	Pave	NaN	IR1	Low	AllPub	...	NaN	NaN	NaN	4	2007	WD	Normal	277000
436	437	50	RM	40.0	4400	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	10	2006	WD	Normal	116000
19	20	20	RL	70.0	7560	Pave	NaN	Reg	Lvl	AllPub	...	NaN	MnPrv	NaN	5	2009	COD	Abnorml	139000
1449	1450	180	RM	21.0	1533	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	8	2006	WD	Abnorml	92000
449	450	50	RM	50.0	6000	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	6	2007	WD	Normal	120000
1185	1186	50	RL	60.0	9738	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	3	2006	WD	Normal	104900
1023	1024	120	RL	43.0	3182	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	5	2008	WD	Normal	191000
	SalePrice	sp_copy_1_percent	sp_copy_5_percent	sp_copy_10_percent
0	208500	208500	208500	208500
1	181500	181500	181500	181500
2	223500	223500	223500	223500
3	140000	140000	140000	140000
4	250000	250000	250000	250000
	SalePrice	sp_copy_1_percent	sp_copy_5_percent	sp_copy_10_percent
0	0.241078	0.241078	0.241078	0.241078
1	0.203583	0.203583	0.203583	0.203583
2	0.261908	0.261908	0.261908	0.261908
3	0.145952	0.145952	0.145952	0.145952
4	0.298709	0.298709	0.298709	0.298709
	SalePrice	sp_copy_1_percent	sp_copy_5_percent	sp_copy_10_percent
0	208500.0	208500.0	208500.0	208500.0
1	181500.0	181500.0	181500.0	181500.0
2	223500.0	223500.0	223500.0	223500.0
3	140000.0	140000.0	140000.0	140000.0
4	250000.0	250000.0	250000.0	250000.0
	diff. list Mean(KNN)	diff. list Var.(KNN)
1%_saleprice	105.000	5.210500e+04
5%_saleprice	163.012	4.601896e+04
10%_saleprice	470.800	3.667553e+06
	diff. list Mean(KNN) scaled	diff. list Var.(KNN) scaled
1%_saleprice	0.000000	0.000000e+00
5%_saleprice	0.000012	7.654124e-09
10%_saleprice	0.000265	2.973842e-06