You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/LectureNotes/_build/html/week47.html
+117Lines changed: 117 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -448,6 +448,9 @@ <h2> Contents </h2>
448
448
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#gating-mechanism-long-short-term-memory-lstm">Gating mechanism: Long Short Term Memory (LSTM)</a></li>
449
449
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#implementing-a-memory-cell-in-a-neural-network">Implementing a memory cell in a neural network</a></li>
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#comparing-with-a-standard-rnn">Comparing with a standard RNN</a></li>
@@ -462,6 +465,11 @@ <h2> Contents </h2>
462
465
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#forget-and-input">Forget and input</a></li>
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#summary-of-lstm">Summary of LSTM</a></li>
466
474
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#lstm-implementation-using-tensorflow">LSTM implementation using TensorFlow</a></li>
467
475
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#and-the-corresponding-one-with-pytorch">And the corresponding one with PyTorch</a></li>
@@ -728,6 +736,7 @@ <h2>PyTorch: Defining a Simple RNN, using Tensorflow<a class="headerlink" href="
728
736
<p>This recurrent neural network uses the TensorFlow/Keras SimpleRNN, which is the counterpart to PyTorch’s nn.RNN.
729
737
In this code we have used</p>
730
738
<olclass="arabic simple">
739
+
<li><p>sequence<spanclass="math notranslate nohighlight">\(\_\)</span>length is the number of time steps in each input sequence fed into a recurrent neural network. It represents how many time points we provide at once. It is the number of ordered observations in each sample of our dataset.</p></li>
731
740
<li><p>return_sequences=False makes it output only the last hidden state, which is fed to the classifier. Also, we have</p></li>
732
741
<li><p>from_logits=True matches the PyTorch CrossEntropyLoss.</p></li>
733
742
</ol>
@@ -1083,6 +1092,44 @@ <h2>LSTM details<a class="headerlink" href="#lstm-details" title="Link to this h
1083
1092
long-term memory, and a hidden state <spanclass="math notranslate nohighlight">\(h\)</span> which can be thought of as
1084
1093
the short-term memory.</p>
1085
1094
</section>
1095
+
<sectionid="lstm-cell-and-gates">
1096
+
<h2>LSTM Cell and Gates<aclass="headerlink" href="#lstm-cell-and-gates" title="Link to this heading">#</a></h2>
1097
+
<olclass="arabic simple">
1098
+
<li><p>Each LSTM cell contains a memory cell <spanclass="math notranslate nohighlight">\(C_t\)</span> and three gates (forget <spanclass="math notranslate nohighlight">\(f_t\)</span>, input <spanclass="math notranslate nohighlight">\(i_t\)</span>, output <spanclass="math notranslate nohighlight">\(o_t\)</span>) that control information flow.</p></li>
1099
+
<li><p><strong>Forget gate</strong> (<spanclass="math notranslate nohighlight">\(f_t\)</span>): chooses which information to erase from the previous cell state <spanclass="math notranslate nohighlight">\(C_{t-1}\)</span></p></li>
1100
+
<li><p><strong>Input gate</strong> (<spanclass="math notranslate nohighlight">\(i_t\)</span>): decides which new information <spanclass="math notranslate nohighlight">\(\tilde{C}_t\)</span> to add to the cell state.</p></li>
1101
+
<li><p><strong>Output gate</strong> (<spanclass="math notranslate nohighlight">\(o_t\)</span>): controls which parts of the cell state become the output <spanclass="math notranslate nohighlight">\(h_t\)</span>.</p></li>
<li><p>Input gate <spanclass="math notranslate nohighlight">\(i_t\)</span> scales how much new candidate memory <spanclass="math notranslate nohighlight">\(\tilde{C}_t\)</span> is written.</p></li>
1129
+
<li><p>Output gate <spanclass="math notranslate nohighlight">\(o_t\)</span> determines how much of the cell’s memory flows into the hidden state <spanclass="math notranslate nohighlight">\(h_t\)</span>.</p></li>
1130
+
<li><p>By controlling these gates, LSTM effectively keeps long-term information when needed.</p></li>
<h2>Basic layout (All figures from Raschka <em>et al.,</em>)<aclass="headerlink" href="#basic-layout-all-figures-from-raschka-et-al" title="Link to this heading">#</a></h2>
@@ -1213,6 +1260,68 @@ <h2>Output gate<a class="headerlink" href="#output-gate" title="Link to this hea
1213
1260
\end{split}\]</div>
1214
1261
<p>where <spanclass="math notranslate nohighlight">\(\mathbf{W_o,U_o}\)</span> are the weights of the output gate and <spanclass="math notranslate nohighlight">\(\mathbf{b_o}\)</span> is the bias of the output gate.</p>
1215
1262
</section>
1263
+
<sectionid="lstm-implementation-code-example">
1264
+
<h2>LSTM Implementation (Code Example)<aclass="headerlink" href="#lstm-implementation-code-example" title="Link to this heading">#</a></h2>
<p>The model learns to map sequences to outputs; input sequences can be constructed via sliding windows.</p>
1286
+
</section>
1287
+
<sectionid="example-modeling-dynamical-systems">
1288
+
<h2>Example: Modeling Dynamical Systems<aclass="headerlink" href="#example-modeling-dynamical-systems" title="Link to this heading">#</a></h2>
1289
+
<olclass="arabic simple">
1290
+
<li><p>LSTMs can learn complex time evolution of physical systems (e.g. Lorenz attractor, fluid dynamics) from data.</p></li>
1291
+
<li><p>Serve as data-driven surrogates for ODE/PDE solvers (trained on RK4-generated time series).</p></li>
1292
+
<li><p>For example, an LSTM surrogate accurately forecast 36h lake hydrodynamics (velocity, temperature) with <spanclass="math notranslate nohighlight">\(<6\%\)</span> error.</p></li>
1293
+
<li><p>Such models dramatically speed up predictions compared to full numerical simulation.</p></li>
1294
+
</ol>
1295
+
</section>
1296
+
<sectionid="example-biological-sequences">
1297
+
<h2>Example: Biological Sequences<aclass="headerlink" href="#example-biological-sequences" title="Link to this heading">#</a></h2>
1298
+
<olclass="arabic simple">
1299
+
<li><p>Biological sequences (DNA/RNA/proteins) are effectively categorical time series.</p></li>
1300
+
<li><p>LSTMs capture sequence motifs and long-range dependencies (akin to language models).</p></li>
1301
+
<li><p>Widely used in genomics and proteomics (e.g., protein function, gene expression).</p></li>
1302
+
<li><p>They naturally handle variable-length input by processing one element at a time.</p></li>
1303
+
</ol>
1304
+
</section>
1305
+
<sectionid="training-tips-and-variants">
1306
+
<h2>Training Tips and Variants<aclass="headerlink" href="#training-tips-and-variants" title="Link to this heading">#</a></h2>
1307
+
<olclass="arabic simple">
1308
+
<li><p>Preprocess time series (normalize features, windowing); handle variable lengths (padding/truncation).</p></li>
1309
+
<li><p>Experiment with network depth, hidden units, and regularization (dropout) to avoid overfitting.</p></li>
1310
+
<li><p>Consider bidirectional LSTM or stacking multiple LSTM layers for complex patterns.</p></li>
1311
+
<li><p>GRU is a simpler gated RNN that combines forget/input gates into one update gate.</p></li>
1312
+
<li><p>Monitor gradients during training; use gradient clipping to stabilize learning if needed.</p></li>
1313
+
</ol>
1314
+
</section>
1315
+
<sectionid="lstm-summary">
1316
+
<h2>LSTM Summary<aclass="headerlink" href="#lstm-summary" title="Link to this heading">#</a></h2>
1317
+
<olclass="arabic simple">
1318
+
<li><p>LSTMs extend RNNs with gated cells to remember long-term context, addressing RNN gradient issues.</p></li>
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#gating-mechanism-long-short-term-memory-lstm">Gating mechanism: Long Short Term Memory (LSTM)</a></li>
2392
2501
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#implementing-a-memory-cell-in-a-neural-network">Implementing a memory cell in a neural network</a></li>
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#comparing-with-a-standard-rnn">Comparing with a standard RNN</a></li>
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#summary-of-lstm">Summary of LSTM</a></li>
2409
2526
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#lstm-implementation-using-tensorflow">LSTM implementation using TensorFlow</a></li>
2410
2527
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#and-the-corresponding-one-with-pytorch">And the corresponding one with PyTorch</a></li>
0 commit comments