Skip to content

Commit a07c31f

Browse files
committed
Add storybook analysis tool
1 parent 1be8770 commit a07c31f

File tree

4 files changed

+28
-1
lines changed

4 files changed

+28
-1
lines changed

exploratory-data-analysis.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Another reason for weighted mean use if when the data collected does not equally
2727

2828
- Median is considered robust because it only accounts for the middle value in the dataset, no matter how high or low the extreme values will be the element ordering does not change and hence not get affected by the extreme values on both ends. The weighted median is also robust for similar reasons.
2929

30-
- A common choice for robust metrics are medians and trimmed mean. A common choice percent of trimming for mean is the top and bottom 10%. The trimmed meanis often thought of as the compromise between median and the mean, since it is robust to extreme values is the data but uses more data to calculate the estimate for location.
30+
- A common choice for robust metrics are medians and trimmed mean. A common choice percent of trimming for mean is the top and bottom 10%. The trimmed mean is often thought of as the compromise between median and the mean, since it is robust to extreme values is the data but uses more data to calculate the estimate for location.
3131

3232
## Variabililty and Estimates for Variability
3333
- Another dimension to explore your dataset/features is the variability/dispersion/how tightly coupled or spread out the values are.

projects/storybook/Makefile

Whitespace-only changes.

projects/storybook/readme.md

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Storybook
2+
3+
- A cpp based library that lets you do non visual exploratory data analysis on a given csv file with non categorical variables.
4+
5+
## Features
6+
Given a dataset give a summary of the following:
7+
- Location Estimates
8+
- Mean/AVG (Sum of vals divided by number of vals).
9+
- Median/50th percentile (The value such that one half od the data lies below it)
10+
- Percentile/Quantile (The value such that P percent of the data lies below it)
11+
- Trimmed Mean/Truncated mean (Mean after dropping a fixed number of extreme values)
12+
13+
- Variability
14+
- Deviations/errors/residuals: The difference between observed values and the estimate of location.
15+
- Variance/mean-squared-error: Sum of squared deviations from the mean divided by n - 1 where n is the number of data values.
16+
- Standard Deviation: Square root of variance.
17+
- Mean Absolute Deviation/L1 Norm/Manhattan Norm: Mean of absolute values of the deviations from the mean.
18+
- Range: Difference between largest and smallest value in a dataset.
19+
- Order Statistics / Rank: Metrics based on the data values sorted from largest ti smallest.
20+
- Percentile: Percentile/Quantile (The value such that P percent of the data lies below it).
21+
- Interquartile Range/IQR: Difference between 75th percentile and 25th percentile.
22+
23+
- Correlation
24+
- Pearson's correlation coefficient amongst the numerical columns.

projects/storybook/src/main.cpp

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
int main() {
2+
return 0;
3+
}

0 commit comments

Comments
 (0)