You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: data-and-sampling-distributions.md
+15
Original file line number
Diff line number
Diff line change
@@ -43,3 +43,18 @@ Bootstrap can be used with multivariate data, where rows are samples as units. A
43
43
- z-score: The result of standardizing an individual data point
44
44
- Standard Normal: A normal distribution with mean = 0 and standard deviation = 1.
45
45
- QQ plot: A plot to visualize how close a sample distribution is to a specified distribution, e.g. to the normal distribution.
46
+

47
+
- To compare data to a standard normal distribution, you subtract the mean and then divide by the standard deviation; this is also called normalization or standardization. The transformed value is termed a z-score, and the normal distribution is sometimes called the z-distribution.
48
+
- A QQ plot orders the z-score from low to high and plots each values z-score on the y axis; th ex axis is the corresponding quantile of a normal distribution for the value's rank. Since the data is normalized the points each value corresponds to the number of standard deviations away from the mean. If the points roughly fall on the diagonal line, then sample distribution can be considered close to normal.
49
+
50
+
## Long-Tailed Distribution
51
+
- Data is generally not normally distributed.
52
+
- Tail: The long narrow portion of a frequencey distribution, where relatively extreme values occur at a low frequency.
53
+
- Skew: Where one tail of a distribution is longer than the other.
54
+
- Sometimes distribution os highly skewed, such as with income data.
55
+
- Nassim Taleb has proposed the black swan theory, which predicts that anomalous events, such as stock market crash, are more likely to occur than would be predicted by the normal distribution.
56
+
57
+
## T-Distribution
58
+
- T-distribution is a normally shaped distribution, except that it is a bit thicker and the longer on the tails. It is used extensively in depicting distributions of sample statistics. The larger the sample the more normally shaped the t-distribution becomes.
59
+
- n: sample size.
60
+
- Degrees of freedom: A parameter that allows the t-distribution to adjust to different sample sizes, statistics, and number of groups.
0 commit comments