|
| 1 | +--- |
| 2 | +title: 'how logistic regression works' |
| 3 | +tags: 'journal' |
| 4 | +date: 'Apr 1, 2025' |
| 5 | +--- |
| 6 | + |
| 7 | +sketching out logistic regression for my interview, because i need to get the fundamentals down. |
| 8 | + |
| 9 | +the problem is we want to predict binary outcomes (0,1), but we can't do that with linear regression that predicts continuous. |
| 10 | + |
| 11 | +how do we get a model that outputs probabilities between 0 and 1? and how can we make a decision boundary to produce binary outcomes? |
| 12 | + |
| 13 | +the answer is a sigmoid function. |
| 14 | + |
| 15 | +$p(x) = \frac{1}{1 + e^{-z}}$ where $z = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n$ |
| 16 | + |
| 17 | +this bounds the output between 0 and 1, and we can create a decision boundary at $p(x) = 0.5$ which is where $z = 0$ |
| 18 | + |
| 19 | +but what are we modeling here? the probabilities? |
| 20 | + |
| 21 | +the key insight is that logistic regression doesn't directly model probabilities in a linear way - it models the log-odds. |
| 22 | + |
| 23 | +Why log-odds? Because: |
| 24 | + |
| 25 | +1. probability constraints: probability must be between 0 and 1, which isn't compatible with linear modeling (that produces unbounded values) |
| 26 | + |
| 27 | +2. log-odds transformation: when we take $\log\left(\frac{p}{1-p}\right)$, we transform the bounded 0-1 range into an unbounded range ($-\infty$ to $+\infty$) |
| 28 | + |
| 29 | +3. linear relationship: This allows us to model log-odds as a linear function of features: |
| 30 | + |
| 31 | + $\log\left(\frac{p}{1-p}\right) = z = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n$ |
| 32 | + |
| 33 | +The magic happens in this transformation. Consider: |
| 34 | + |
| 35 | +- If $p = 0.5$, log-odds $= 0$ |
| 36 | +- If $p > 0.5$, log-odds $> 0$ |
| 37 | +- If $p < 0.5$, log-odds $< 0$ |
| 38 | +- As $p$ approaches $1$, log-odds approaches $+\infty$ |
| 39 | +- As $p$ approaches $0$, log-odds approaches $-\infty$ |
| 40 | + |
| 41 | +So we're essentially saying: |
| 42 | + |
| 43 | +1. we want to model probability $p$ |
| 44 | +2. but we can't directly use linear regression on $p$ (bounded) |
| 45 | +3. so we transform $p$ to log-odds (unbounded) |
| 46 | +4. model log-odds linearly |
| 47 | +5. transform back to probability using sigmoid |
| 48 | + |
| 49 | +this is why the coefficients in logistic regression represent changes in log-odds, and we can exponentiate them ($e^{\beta}$) to get odds ratios. |
| 50 | + |
| 51 | +but how do we estimate our coefficients ($\beta_0 \to \beta_n$) that maximizes the probability of observing our training data? |
| 52 | + |
| 53 | +we need a way to estimate the best coefficients ($\beta_0, \beta_1, ..., \beta_n$) that maximize the probability of observing our training data. |
| 54 | + |
| 55 | +to do that, we need MLE. |
| 56 | + |
| 57 | +we use MLE to find the coefficients that make our observed data most likely: |
| 58 | + |
| 59 | +first we want to likelihood function, to calculate it, we calculate the probability of its actual outcome for each data point |
| 60 | + |
| 61 | +for a binary classification, the likelihood is |
| 62 | + |
| 63 | +$L(\beta) = \prod p(x)^y \cdot (1-p(x))^{(1-y)}$ |
| 64 | + |
| 65 | +Where $y$ is the true label (0 or 1) |
| 66 | + |
| 67 | +to make optimization easier, we take the log so it converts the multiplication into an addition, known as the log likelihood |
| 68 | + |
| 69 | +$$\log(L(\beta)) = \sum [y \cdot \log(p(x)) + (1-y) \cdot \log(1-p(x))]$$ |
| 70 | + |
| 71 | +and unlike linear regression's closed-form solution, logistic regression uses iterative methods like gradient descent. |
| 72 | + |
| 73 | +the goal is to find $\beta$ values that maximize this log-likelihood, essentially finding the most probable model given the data. |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +more resources |
| 78 | + |
| 79 | +- visualization by [MLU explain](https://mlu-explain.github.io/logistic-regression/) |
| 80 | +- [Concise Implementation of Softmax Regression — Dive into Deep Learning 1.0.3 documentation](https://d2l.ai/chapter_linear-classification/softmax-regression-concise.html) |
0 commit comments