Skip to content

Commit 9221e3f

Browse files
committed
update
1 parent 42ce0a5 commit 9221e3f

File tree

96 files changed

+10458
-6854
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

96 files changed

+10458
-6854
lines changed
3.08 KB
Binary file not shown.
21.1 KB
Binary file not shown.

doc/LectureNotes/_build/html/_sources/week47.ipynb

Lines changed: 342 additions & 145 deletions
Large diffs are not rendered by default.

doc/LectureNotes/_build/html/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

doc/LectureNotes/_build/html/week47.html

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -448,6 +448,9 @@ <h2> Contents </h2>
448448
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gating-mechanism-long-short-term-memory-lstm">Gating mechanism: Long Short Term Memory (LSTM)</a></li>
449449
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#implementing-a-memory-cell-in-a-neural-network">Implementing a memory cell in a neural network</a></li>
450450
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-details">LSTM details</a></li>
451+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-cell-and-gates">LSTM Cell and Gates</a></li>
452+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#core-lstm-equations">Core LSTM Equations</a></li>
453+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gate-intuition-and-dynamics">Gate Intuition and Dynamics</a></li>
451454
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#basic-layout-all-figures-from-raschka-et-al">Basic layout (All figures from Raschka <em>et al.,</em>)</a></li>
452455
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">LSTM details</a></li>
453456
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#comparing-with-a-standard-rnn">Comparing with a standard RNN</a></li>
@@ -462,6 +465,11 @@ <h2> Contents </h2>
462465
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#forget-and-input">Forget and input</a></li>
463466
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Basic layout</a></li>
464467
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#output-gate">Output gate</a></li>
468+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-implementation-code-example">LSTM Implementation (Code Example)</a></li>
469+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-modeling-dynamical-systems">Example: Modeling Dynamical Systems</a></li>
470+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-biological-sequences">Example: Biological Sequences</a></li>
471+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#training-tips-and-variants">Training Tips and Variants</a></li>
472+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-summary">LSTM Summary</a></li>
465473
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summary-of-lstm">Summary of LSTM</a></li>
466474
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-implementation-using-tensorflow">LSTM implementation using TensorFlow</a></li>
467475
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#and-the-corresponding-one-with-pytorch">And the corresponding one with PyTorch</a></li>
@@ -728,6 +736,7 @@ <h2>PyTorch: Defining a Simple RNN, using Tensorflow<a class="headerlink" href="
728736
<p>This recurrent neural network uses the TensorFlow/Keras SimpleRNN, which is the counterpart to PyTorch’s nn.RNN.
729737
In this code we have used</p>
730738
<ol class="arabic simple">
739+
<li><p>sequence<span class="math notranslate nohighlight">\(\_\)</span>length is the number of time steps in each input sequence fed into a recurrent neural network. It represents how many time points we provide at once. It is the number of ordered observations in each sample of our dataset.</p></li>
731740
<li><p>return_sequences=False makes it output only the last hidden state, which is fed to the classifier. Also, we have</p></li>
732741
<li><p>from_logits=True matches the PyTorch CrossEntropyLoss.</p></li>
733742
</ol>
@@ -1083,6 +1092,44 @@ <h2>LSTM details<a class="headerlink" href="#lstm-details" title="Link to this h
10831092
long-term memory, and a hidden state <span class="math notranslate nohighlight">\(h\)</span> which can be thought of as
10841093
the short-term memory.</p>
10851094
</section>
1095+
<section id="lstm-cell-and-gates">
1096+
<h2>LSTM Cell and Gates<a class="headerlink" href="#lstm-cell-and-gates" title="Link to this heading">#</a></h2>
1097+
<ol class="arabic simple">
1098+
<li><p>Each LSTM cell contains a memory cell <span class="math notranslate nohighlight">\(C_t\)</span> and three gates (forget <span class="math notranslate nohighlight">\(f_t\)</span>, input <span class="math notranslate nohighlight">\(i_t\)</span>, output <span class="math notranslate nohighlight">\(o_t\)</span>) that control information flow.</p></li>
1099+
<li><p><strong>Forget gate</strong> (<span class="math notranslate nohighlight">\(f_t\)</span>): chooses which information to erase from the previous cell state <span class="math notranslate nohighlight">\(C_{t-1}\)</span></p></li>
1100+
<li><p><strong>Input gate</strong> (<span class="math notranslate nohighlight">\(i_t\)</span>): decides which new information <span class="math notranslate nohighlight">\(\tilde{C}_t\)</span> to add to the cell state.</p></li>
1101+
<li><p><strong>Output gate</strong> (<span class="math notranslate nohighlight">\(o_t\)</span>): controls which parts of the cell state become the output <span class="math notranslate nohighlight">\(h_t\)</span>.</p></li>
1102+
<li><p>The cell state update: <span class="math notranslate nohighlight">\(C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t\)</span></p></li>
1103+
</ol>
1104+
</section>
1105+
<section id="core-lstm-equations">
1106+
<h2>Core LSTM Equations<a class="headerlink" href="#core-lstm-equations" title="Link to this heading">#</a></h2>
1107+
<p><strong>The gate computations and state updates are given by:</strong></p>
1108+
<div class="math notranslate nohighlight">
1109+
\[\begin{split}
1110+
\begin{align*}
1111+
f_t &amp;= \sigma(W_f [h_{t-1}, x_t] + b_f), \\
1112+
i_t &amp;= \sigma(W_i [h_{t-1}, x_t] + b_i), \\
1113+
\tilde{C}_t &amp;= \tanh(W_C [h_{t-1}, x_t] + b_C), \\
1114+
C_t &amp;= f_t \odot C_{t-1} + i_t \odot \tilde{C}_t, \\
1115+
o_t &amp;= \sigma(W_o [h_{t-1}, x_t] + b_o), \\
1116+
h_t &amp;= o_t \odot \tanh(C_t).
1117+
\end{align*}
1118+
\end{split}\]</div>
1119+
<ol class="arabic simple">
1120+
<li><p><span class="math notranslate nohighlight">\(\sigma\)</span> is the sigmoid function, <span class="math notranslate nohighlight">\(\odot\)</span> is elementwise product <a class="reference external" href="https://jaketae.github.io/study/dissecting-lstm/#:~:text=%5C%5B%5Cbegin,align">oai_citation:4‡jaketae.github.io</a>.</p></li>
1121+
<li><p>These equations define how LSTM retains/updates memory and produces outputs.</p></li>
1122+
</ol>
1123+
</section>
1124+
<section id="gate-intuition-and-dynamics">
1125+
<h2>Gate Intuition and Dynamics<a class="headerlink" href="#gate-intuition-and-dynamics" title="Link to this heading">#</a></h2>
1126+
<ol class="arabic simple">
1127+
<li><p>Forget gate <span class="math notranslate nohighlight">\(f_t\)</span> acts as a soft “erase” signal: <span class="math notranslate nohighlight">\(f_t \approx 0\)</span> forgets, <span class="math notranslate nohighlight">\(f_t \approx 1\)</span> retains previous memory.</p></li>
1128+
<li><p>Input gate <span class="math notranslate nohighlight">\(i_t\)</span> scales how much new candidate memory <span class="math notranslate nohighlight">\(\tilde{C}_t\)</span> is written.</p></li>
1129+
<li><p>Output gate <span class="math notranslate nohighlight">\(o_t\)</span> determines how much of the cell’s memory flows into the hidden state <span class="math notranslate nohighlight">\(h_t\)</span>.</p></li>
1130+
<li><p>By controlling these gates, LSTM effectively keeps long-term information when needed.</p></li>
1131+
</ol>
1132+
</section>
10861133
<section id="basic-layout-all-figures-from-raschka-et-al">
10871134
<h2>Basic layout (All figures from Raschka <em>et al.,</em>)<a class="headerlink" href="#basic-layout-all-figures-from-raschka-et-al" title="Link to this heading">#</a></h2>
10881135
<!-- dom:FIGURE: [figslides/LSTM1.png, width=700 frac=1.0] -->
@@ -1213,6 +1260,68 @@ <h2>Output gate<a class="headerlink" href="#output-gate" title="Link to this hea
12131260
\end{split}\]</div>
12141261
<p>where <span class="math notranslate nohighlight">\(\mathbf{W_o,U_o}\)</span> are the weights of the output gate and <span class="math notranslate nohighlight">\(\mathbf{b_o}\)</span> is the bias of the output gate.</p>
12151262
</section>
1263+
<section id="lstm-implementation-code-example">
1264+
<h2>LSTM Implementation (Code Example)<a class="headerlink" href="#lstm-implementation-code-example" title="Link to this heading">#</a></h2>
1265+
<ol class="arabic simple">
1266+
<li><p>Using high-level libraries (Keras, PyTorch) simplifies LSTM usage.</p></li>
1267+
<li><p>define and train a Keras LSTM on a univariate time series:</p></li>
1268+
</ol>
1269+
<div class="cell docutils container">
1270+
<div class="cell_input docutils container">
1271+
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from tensorflow.keras.models import Sequential
1272+
from tensorflow.keras.layers import LSTM, Dense
1273+
1274+
# X_train shape: (samples, timesteps, 1)
1275+
model = Sequential([
1276+
LSTM(32, input_shape=(None, 1)),
1277+
Dense(1)
1278+
])
1279+
model.compile(optimizer=&#39;adam&#39;, loss=&#39;mse&#39;)
1280+
model.fit(X_train, y_train, epochs=20, batch_size=16)
1281+
</pre></div>
1282+
</div>
1283+
</div>
1284+
</div>
1285+
<p>The model learns to map sequences to outputs; input sequences can be constructed via sliding windows.</p>
1286+
</section>
1287+
<section id="example-modeling-dynamical-systems">
1288+
<h2>Example: Modeling Dynamical Systems<a class="headerlink" href="#example-modeling-dynamical-systems" title="Link to this heading">#</a></h2>
1289+
<ol class="arabic simple">
1290+
<li><p>LSTMs can learn complex time evolution of physical systems (e.g. Lorenz attractor, fluid dynamics) from data.</p></li>
1291+
<li><p>Serve as data-driven surrogates for ODE/PDE solvers (trained on RK4-generated time series).</p></li>
1292+
<li><p>For example, an LSTM surrogate accurately forecast 36h lake hydrodynamics (velocity, temperature) with <span class="math notranslate nohighlight">\(&lt;6\%\)</span> error.</p></li>
1293+
<li><p>Such models dramatically speed up predictions compared to full numerical simulation.</p></li>
1294+
</ol>
1295+
</section>
1296+
<section id="example-biological-sequences">
1297+
<h2>Example: Biological Sequences<a class="headerlink" href="#example-biological-sequences" title="Link to this heading">#</a></h2>
1298+
<ol class="arabic simple">
1299+
<li><p>Biological sequences (DNA/RNA/proteins) are effectively categorical time series.</p></li>
1300+
<li><p>LSTMs capture sequence motifs and long-range dependencies (akin to language models).</p></li>
1301+
<li><p>Widely used in genomics and proteomics (e.g., protein function, gene expression).</p></li>
1302+
<li><p>They naturally handle variable-length input by processing one element at a time.</p></li>
1303+
</ol>
1304+
</section>
1305+
<section id="training-tips-and-variants">
1306+
<h2>Training Tips and Variants<a class="headerlink" href="#training-tips-and-variants" title="Link to this heading">#</a></h2>
1307+
<ol class="arabic simple">
1308+
<li><p>Preprocess time series (normalize features, windowing); handle variable lengths (padding/truncation).</p></li>
1309+
<li><p>Experiment with network depth, hidden units, and regularization (dropout) to avoid overfitting.</p></li>
1310+
<li><p>Consider bidirectional LSTM or stacking multiple LSTM layers for complex patterns.</p></li>
1311+
<li><p>GRU is a simpler gated RNN that combines forget/input gates into one update gate.</p></li>
1312+
<li><p>Monitor gradients during training; use gradient clipping to stabilize learning if needed.</p></li>
1313+
</ol>
1314+
</section>
1315+
<section id="lstm-summary">
1316+
<h2>LSTM Summary<a class="headerlink" href="#lstm-summary" title="Link to this heading">#</a></h2>
1317+
<ol class="arabic simple">
1318+
<li><p>LSTMs extend RNNs with gated cells to remember long-term context, addressing RNN gradient issues.</p></li>
1319+
<li><p>Core update: <span class="math notranslate nohighlight">\(C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t\)</span>, output <span class="math notranslate nohighlight">\(h_t = o_t \odot \tanh(C_t)\)</span>.</p></li>
1320+
<li><p>Implementation is straightforward in libraries like Keras/PyTorch with few lines of code.</p></li>
1321+
<li><p>Applications span science and engineering: forecasting dynamical systems, analyzing DNA/proteins, etc.</p></li>
1322+
<li><p>For more details, see Goodfellow et al. (2016) Deep Learning, chapter 14</p></li>
1323+
</ol>
1324+
</section>
12161325
<section id="summary-of-lstm">
12171326
<h2>Summary of LSTM<a class="headerlink" href="#summary-of-lstm" title="Link to this heading">#</a></h2>
12181327
<p>LSTMs provide a basic approach for modeling long-range dependencies in sequences.
@@ -2391,6 +2500,9 @@ <h2>Dimensionality reduction<a class="headerlink" href="#dimensionality-reductio
23912500
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gating-mechanism-long-short-term-memory-lstm">Gating mechanism: Long Short Term Memory (LSTM)</a></li>
23922501
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#implementing-a-memory-cell-in-a-neural-network">Implementing a memory cell in a neural network</a></li>
23932502
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-details">LSTM details</a></li>
2503+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-cell-and-gates">LSTM Cell and Gates</a></li>
2504+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#core-lstm-equations">Core LSTM Equations</a></li>
2505+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gate-intuition-and-dynamics">Gate Intuition and Dynamics</a></li>
23942506
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#basic-layout-all-figures-from-raschka-et-al">Basic layout (All figures from Raschka <em>et al.,</em>)</a></li>
23952507
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">LSTM details</a></li>
23962508
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#comparing-with-a-standard-rnn">Comparing with a standard RNN</a></li>
@@ -2405,6 +2517,11 @@ <h2>Dimensionality reduction<a class="headerlink" href="#dimensionality-reductio
24052517
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#forget-and-input">Forget and input</a></li>
24062518
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Basic layout</a></li>
24072519
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#output-gate">Output gate</a></li>
2520+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-implementation-code-example">LSTM Implementation (Code Example)</a></li>
2521+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-modeling-dynamical-systems">Example: Modeling Dynamical Systems</a></li>
2522+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-biological-sequences">Example: Biological Sequences</a></li>
2523+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#training-tips-and-variants">Training Tips and Variants</a></li>
2524+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-summary">LSTM Summary</a></li>
24082525
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summary-of-lstm">Summary of LSTM</a></li>
24092526
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-implementation-using-tensorflow">LSTM implementation using TensorFlow</a></li>
24102527
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#and-the-corresponding-one-with-pytorch">And the corresponding one with PyTorch</a></li>

0 commit comments

Comments
 (0)