Skip to content

Commit

Permalink
Fix R summary for pandoc
Browse files Browse the repository at this point in the history
  • Loading branch information
PavelCz authored Apr 21, 2022
1 parent 3d9d9b6 commit b44800f
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions Data Analysis and Visualization in R/R.md
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ To obtain a partition from a hierarchical clustering, a threshold can be decided

##### Rand index

$$ R = \frac{a+b}{{n}\choose{2}}$$
$$ R = \frac{a+b}{{{n}\choose{2}}}$$

* $S$ is a set of $n$ elements;
* $X$ is a partition of $S$ into $k$ sets;
Expand Down Expand Up @@ -508,7 +508,7 @@ Binary classification has outcome $k$ which can only take values of 0 or 1. Lin

The system is rewritten so that $\epsilon_i$ (error), following a normal distribution, becomes the probability of having a value $y_i$ according to a normal with mean $y_i | \mu_i$. The expectation is then $E(y_i | x_i) = \mu_i$.

Logistic regression models the conditional expectation of the outcome conditioned on the features. The expectation $\mu$ in a binary classification is probability of class 1 ($\mu > 0.5$): real numbers used in linear regression are mapped to the $[0, 1]$ interval using the logistic function $ \lambda(t) = \frac{1}{1+e^{-t}}$ or the inverse sigmoid (logit).
Logistic regression models the conditional expectation of the outcome conditioned on the features. The expectation $\mu$ in a binary classification is probability of class 1 ($\mu > 0.5$): real numbers used in linear regression are mapped to the $[0, 1]$ interval using the logistic function $\lambda(t) = \frac{1}{1+e^{-t}}$ or the inverse sigmoid (logit).

Logistic regression can also be applied to generalized linear models (Poisson, Gamma) exploiting a probability distribution from the exponential family, a linear predictor and a link function (inverse of activation function).

Expand Down Expand Up @@ -571,4 +571,4 @@ Cross-validation identifies optimal mode complexity without needing test data, m

This method works on the assumption that training and test samples are independently and identically distributed, which is not always the case. Data sometimes comes in clusters, and measures are correlated.

Performing cross-validation at the level of individual data will favor models that learns the cluster, therefore it needs to be performed at a cluster level, requiring application knowledge and eventual data visualization techniques.
Performing cross-validation at the level of individual data will favor models that learns the cluster, therefore it needs to be performed at a cluster level, requiring application knowledge and eventual data visualization techniques.

0 comments on commit b44800f

Please sign in to comment.