Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions episodes/5-transfer-learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

An example: Let's say that you want to train a model to classify images of different dog breeds. You could make use of a pre-trained network that learned how to classify images of dogs and cats. The pre-trained network will not know anything about different dog breeds, but it will have captured some general knowledge of, on a high-level, what dogs look like, and on a low-level all the different features (eyes, ears, paws, fur) that make up an image of a dog. Further training this model on your dog breed dataset is a much easier task than training from scratch, because the model can use the general knowledge captured in the pre-trained network.

![](episodes/fig/05-transfer_learning.png)

Check warning on line 20 in episodes/5-transfer-learning.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: episodes/fig/05-transfer_learning.png
<!--
Edit this plot using the Mermaid live editor:
1. Open this link that includes the source code of the chart to open the live editor web interface:
Expand Down Expand Up @@ -258,6 +258,120 @@
::::
:::

::: challenge
## Fine-Tune the Top Layer of the Pretrained Model

So far, we've trained only the custom head while keeping the DenseNet121 base frozen. Let's now **unfreeze just the top layer group** of the base model and observe how performance changes.

### 1. Unfreeze top layers
Unfreeze just the final convolutional block of the base model using:

```python
# 1. Unfreeze top block of base model
set_trainable = False
for layer in base_model.layers:
if 'conv5' in layer.name:
set_trainable = True
layer.trainable = set_trainable
```

### 2. Recompile the model
Any time you change layer trainability, you **must recompile** the model.

Use the same optimizer and loss function as before:
- `optimizer='adam'`
- `loss=SparseCategoricalCrossentropy(from_logits=True)`
- `metrics=['accuracy']`

### 3. Retrain the model
Retrain the model using the same setup as before:

- `batch_size=32`
- `epochs=30`
- Early stopping with `patience=5`
- Pass in the validation set using `validation_data`
- Store the result in a new variable called `history_finetune`

> You can reuse your `early_stopper` callback or redefine it.

### 4. Compare with baseline (head only)
Plot the **validation accuracy** for both the baseline and fine-tuned models.

**Questions to reflect on:**
- Did unfreezing part of the base model improve validation accuracy?
- Did training time increase significantly?
- Is there any evidence of overfitting?

:::: solution
## Solution
```python
# 1. Unfreeze top block of base model
set_trainable = False
for layer in base_model.layers:
if 'conv5' in layer.name:
set_trainable = True
else:
set_trainable = False
layer.trainable = set_trainable

# 2. Recompile the model
model.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])

# 3. Retrain the model
early_stopper = keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=5)
history_finetune = model.fit(train_images, train_labels,
batch_size=32,
epochs=30,
validation_data=(val_images, val_labels),
callbacks=[early_stopper])

# 4. Plot comparison
def plot_two_histories(h1, h2, label1='Frozen', label2='Finetuned'):
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.DataFrame(h1.history)
df2 = pd.DataFrame(h2.history)
plt.plot(df1['val_accuracy'], label=label1)
plt.plot(df2['val_accuracy'], label=label2)
plt.xlabel("Epochs")
plt.ylabel("Validation Accuracy")
plt.legend()
plt.title("Validation Accuracy: Frozen vs. Finetuned")
plt.show()

plot_two_histories(history, history_finetune)

```

![](episodes/fig/05-frozen_vs_finetuned.png)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have alt image descriptions everywhere, e.g.

Suggested change
![](episodes/fig/05-frozen_vs_finetuned.png)
![](episodes/fig/05-frozen_vs_finetuned.png){alt="A comparison of the accuracy on the validation set for both the frozen and the fine-tuned setup."}


**Discussion of results**: Validation accuracy improved across all epochs compared to the frozen baseline. Training time also increased slightly, but the model was able to adapt better to the new dataset by fine-tuning the top convolutional block.

This makes sense: by unfreezing the last part of the base model, you're allowing it to adjust high-level features to the new domain, while still keeping the earlier, general-purpose filters/feature-detectors of the model intact.


**What happens if you unfreeze too many layers?**
If you unfreeze most or all of the base model:

- Training time increases significantly because more weights are being updated.
- The model may forget some of the general-purpose features it learned during pretraining. This is called "catastrophic forgetting."
- Overfitting becomes more likely, especially if your dataset is small or noisy.


### When does this approach work best?

Fine-tuning a few top layers is a good middle ground. You're adapting the model without retraining everything from scratch. If your dataset is small or very different from the original ImageNet data, you should be careful not to unfreeze too many layers.

For most use cases:
- Freeze most layers
- Unfreeze the top block or two
- Avoid full fine-tuning unless you have lots of data and compute

::::
:::

## Concluding: The power of transfer learning
In many domains, large networks are available that have been trained on vast amounts of data, such as in computer vision and natural language processing. Using transfer learning, you can benefit from the knowledge that was captured from another machine learning task. In many fields, transfer learning will outperform models trained from scratch, especially if your dataset is small or of poor quality.

Expand Down
Binary file added episodes/fig/05-frozen_vs_finetuned.png
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image does not look very convincing, both curves end at a similar value for validation accuracy (~0.6). This seems to require some explanation.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading