Add storybook analysis tool

hamdaankhalid · hamdaankhalid · commit a07c31f9b61e · 2022-09-19T14:58:21.000-07:00
diff --git a/exploratory-data-analysis.md b/exploratory-data-analysis.md
@@ -27,7 +27,7 @@ Another reason for weighted mean use if when the data collected does not equally
 
 - Median is considered robust because it only accounts for the middle value in the dataset, no matter how high or low the extreme values will be the element ordering does not change and hence not get affected by the extreme values on both ends. The weighted median is also robust for similar reasons.
 
-- A common choice for robust metrics are medians and trimmed mean. A common choice percent of trimming for mean is the top and bottom 10%. The trimmed meanis often thought of as the compromise between median and the mean, since it is robust to extreme values is the data but uses more data to calculate the estimate for location.
+- A common choice for robust metrics are medians and trimmed mean. A common choice percent of trimming for mean is the top and bottom 10%. The trimmed mean is often thought of as the compromise between median and the mean, since it is robust to extreme values is the data but uses more data to calculate the estimate for location.
 
 ## Variabililty and Estimates for Variability
 - Another dimension to explore your dataset/features is the variability/dispersion/how tightly coupled or spread out the values are.
diff --git a/projects/storybook/Makefile b/projects/storybook/Makefile
diff --git a/projects/storybook/readme.md b/projects/storybook/readme.md
@@ -0,0 +1,24 @@
+# Storybook
+
+- A cpp based library that lets you do non visual exploratory data analysis on a given csv file with non categorical variables.
+
+## Features
+Given a dataset give a summary of the following:
+- Location Estimates
+  - Mean/AVG (Sum of vals divided by number of vals).
+  - Median/50th percentile (The value such that one half od the data lies below it)
+  - Percentile/Quantile (The value such that P percent of the data lies below it)
+  - Trimmed Mean/Truncated mean (Mean after dropping a fixed number of extreme values)
+  
+- Variability
+  - Deviations/errors/residuals: The difference between observed values and the estimate of location.
+  - Variance/mean-squared-error: Sum of squared deviations from the mean divided by n - 1 where n is the number of data values.
+  - Standard Deviation: Square root of variance.
+  - Mean Absolute Deviation/L1 Norm/Manhattan Norm: Mean of absolute values of the deviations from the mean.
+  - Range: Difference between largest and smallest value in a dataset.
+  - Order Statistics / Rank: Metrics based on the data values sorted from largest ti smallest.
+  - Percentile: Percentile/Quantile (The value such that P percent of the data lies below it).
+  - Interquartile Range/IQR: Difference between 75th percentile and 25th percentile.
+
+- Correlation
+  - Pearson's correlation coefficient amongst the numerical columns.
diff --git a/projects/storybook/src/main.cpp b/projects/storybook/src/main.cpp
@@ -0,0 +1,3 @@
+int main() {
+  return 0;
+}