Skip to content

Commit f27b2c9

Browse files
committed
changed variable names
1 parent 831312c commit f27b2c9

17 files changed

+454
-461
lines changed

doc/pub/week37/html/._week37-bs022.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -339,8 +339,8 @@ <h2 id="stochastic-gradient-descent" class="anchor">Stochastic Gradient Descent
339339
sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
340340
</p>
341341
$$
342-
C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
343-
\mathbf{\beta}).
342+
C(\mathbf{\theta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
343+
\mathbf{\theta}).
344344
$$
345345

346346

doc/pub/week37/html/._week37-bs023.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -334,8 +334,8 @@ <h2 id="computation-of-gradients" class="anchor">Computation of gradients </h2>
334334
computed as a sum over \( i \)-gradients
335335
</p>
336336
$$
337-
\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
338-
\mathbf{\beta}).
337+
\nabla_\theta C(\mathbf{\theta}) = \sum_i^n \nabla_\theta c_i(\mathbf{x}_i,
338+
\mathbf{\theta}).
339339
$$
340340

341341
<p>Stochasticity/randomness is introduced by only taking the

doc/pub/week37/html/._week37-bs024.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -344,10 +344,10 @@ <h2 id="sgd-example" class="anchor">SGD example </h2>
344344
picked at random in each gradient descent step
345345
</p>
346346
$$
347-
\nabla_{\beta}
348-
C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
349-
\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
350-
c_i(\mathbf{x}_i, \mathbf{\beta}).
347+
\nabla_{\theta}
348+
C(\mathbf{\theta}) = \sum_{i=1}^n \nabla_\theta c_i(\mathbf{x}_i,
349+
\mathbf{\theta}) \rightarrow \sum_{i \in B_k}^n \nabla_\theta
350+
c_i(\mathbf{x}_i, \mathbf{\theta}).
351351
$$
352352

353353

doc/pub/week37/html/._week37-bs025.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -332,8 +332,8 @@ <h2 id="the-gradient-step" class="anchor">The gradient step </h2>
332332

333333
<p>Thus a gradient descent step now looks like </p>
334334
$$
335-
\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
336-
\mathbf{\beta})
335+
\theta_{j+1} = \theta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\theta c_i(\mathbf{x}_i,
336+
\mathbf{\theta})
337337
$$
338338

339339
<p>where \( k \) is picked at random with equal

doc/pub/week37/html/._week37-bs027.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -338,7 +338,7 @@ <h2 id="when-do-we-stop" class="anchor">When do we stop? </h2>
338338
that we are close to a local/global minimum. However, we could also
339339
evaluate the cost function at this point, store the result and
340340
continue the search. If the test kicks in at a later stage we can
341-
compare the values of the cost function and keep the \( \beta \) that
341+
compare the values of the cost function and keep the \( \theta \) that
342342
gave the lowest value.
343343
</p>
344344

doc/pub/week37/html/._week37-bs029.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -332,10 +332,10 @@ <h2 id="time-decay-rate" class="anchor">Time decay rate </h2>
332332

333333
<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
334334

335-
<p>In this way we can fix the number of epochs, compute \( \beta \) and
335+
<p>In this way we can fix the number of epochs, compute \( \theta \) and
336336
evaluate the cost function at the end. Repeating the computation will
337337
give a different result since the scheme is random by design. Then we
338-
pick the final \( \beta \) that gives the lowest value of the cost
338+
pick the final \( \theta \) that gives the lowest value of the cost
339339
function.
340340
</p>
341341

@@ -364,7 +364,7 @@ <h2 id="time-decay-rate" class="anchor">Time decay rate </h2>
364364
<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
365365
k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
366366
<span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
367-
<span style="color: #408080; font-style: italic">#Compute new suggestion for beta</span>
367+
<span style="color: #408080; font-style: italic">#Compute new suggestion for theta</span>
368368
t <span style="color: #666666">=</span> epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i
369369
gamma_j <span style="color: #666666">=</span> step_length(t,t0,t1)
370370
j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>

doc/pub/week37/html/._week37-bs041.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -333,10 +333,10 @@ <h2 id="rmsprop-adaptive-learning-rates" class="anchor">RMSProp: Adaptive Learni
333333
Uses a decaying average of squared gradients (instead of a cumulative sum):
334334
</p>
335335
$$
336-
v_t = \beta_2\, v_{t-1} + (1-\beta_2)\, (\nabla L(w_t))^2,
336+
v_t = \theta_2\, v_{t-1} + (1-\theta_2)\, (\nabla L(w_t))^2,
337337
$$
338338

339-
<p>with \( \beta_2 \) typically \( 0.9 \) (or \( 0.99 \)).</p>
339+
<p>with \( \theta_2 \) typically \( 0.9 \) (or \( 0.99 \)).</p>
340340
<ol>
341341
<li> Update: \( w_{t+1} = w_t - \frac{\alpha}{\sqrt{v_t + \epsilon}} \nabla L(w_t) \).</li>
342342
<li> Recent gradients have more weight, so \( v_t \) adapts to the current landscape.</li>

doc/pub/week37/html/._week37-bs043.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -340,13 +340,13 @@ <h2 id="rms-prop" class="anchor">RMS prop </h2>
340340
\begin{align}
341341
\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta})
342342
\tag{3}\\
343-
\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
343+
\mathbf{s}_t &=\theta \mathbf{s}_{t-1} +(1-\theta)\mathbf{g}_t^2 \nonumber \\
344344
\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
345345
\end{align}
346346
$$
347347

348-
<p>where \( \beta \) controls the averaging time of the second moment and is
349-
typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate
348+
<p>where \( \theta \) controls the averaging time of the second moment and is
349+
typically taken to be about \( \theta=0.9 \), \( \eta_t \) is a learning rate
350350
typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8} \) is a
351351
small regularization constant to prevent divergences. Multiplication
352352
and division by vectors is understood as an element-wise operation. It

doc/pub/week37/html/._week37-bs044.html

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -355,16 +355,16 @@ <h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980" class="anchor"><a href="ht
355355
\begin{align}
356356
\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta})
357357
\tag{4}\\
358-
\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
359-
\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
360-
\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
361-
\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
358+
\mathbf{m}_t &= \theta_1 \mathbf{m}_{t-1} + (1-\theta_1) \mathbf{g}_t \nonumber \\
359+
\mathbf{s}_t &=\theta_2 \mathbf{s}_{t-1} +(1-\theta_2)\mathbf{g}_t^2 \nonumber \\
360+
\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\theta_1^t} \nonumber \\
361+
\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\theta_2^t} \nonumber \\
362362
\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\
363363
\tag{5}
364364
\end{align}
365365
$$
366366

367-
<p>where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and
367+
<p>where \( \theta_1 \) and \( \theta_2 \) set the memory lifetime of the first and
368368
second moment and are typically taken to be \( 0.9 \) and \( 0.99 \)
369369
respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop.
370370
</p>

doc/pub/week37/html/._week37-bs047.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -356,8 +356,8 @@ <h2 id="sneaking-in-automatic-differentiation-using-autograd" class="anchor">Sne
356356
<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
357357
<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
358358

359-
<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
360-
<span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
359+
<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(theta):
360+
<span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
361361

362362
n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
363363
x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)

0 commit comments

Comments
 (0)