Robust Style Transfer with Transformers

This repository implements and extends the StyTr² architecture for neural style transfer using Vision Transformers. Our work focuses on improving robustness under challenging style inputs—particularly abstract art—by introducing novel separability loss functions. Our paper can be found in the repository as 'Style_Transfer_ViT.pdf'.

Overview

Neural style transfer synthesizes images that combine the structural content of one image with the artistic style of another. Transformer-based approaches like StyTr² achieve high fidelity but can produce noisy outputs or struggle with abstract styles. We address these issues by adding two explicit separability losses—L_sep1 and L_sep2—which encourage content and style features to occupy distinct subspaces during training.

Key Contributions

Separability Losses
- L_sep1: Content-focused margin loss
- L_sep2: Joint content-style margin loss
PyTorch Lightning Integration
Modular, reproducible training pipeline with support for multi-GPU.
Empirical Validation
Benchmarked against StyTr² on both abstract and natural image datasets.

Architecture

Vision Transformer Encoders for content and style
CAPE (Content-Aware Positional Embedding)
Transformer Decoder for content-guided stylization
VGG19 Feature Extractor (frozen) for loss computation

Datasets

Random Image Sample Dataset (pankajkumar2002 on Kaggle)
3,000 content images (150×150)
Abstract Art Images (greg115 on Kaggle)
8,145 abstract style references (512×512)

Both datasets are public, diverse, and high-quality.

Setup & Usage

Dependencies
All required packages are installed automatically when you open the notebook in Colab. See requirements.txt for a static list.
Running in Colab
1. Upload training_pipeline.ipynb to Colab.
2. Run all cells—initial cells install dependencies.
3. Follow the notebook prompts to train or evaluate models.

All functionality is contained within the notebook. Simply open a Jupyter environment and the training pipeline is setup with all necessary hyperparameters.

Results

Model	Content Loss (Lₙ)	Style Loss (Lₛ)
StyTr²	2.0125	1.9512
L_sep1	1.9909	1.9630
L_sep2	2.0253	1.9555

L_sep1 yields the best content preservation and fastest convergence.
L_sep2 improves resilience to noise under abstract styles.

Here are some example visualizations from our best models:

Authors

Chaitanya Tatipigari
Project lead; proposed separability losses; implemented L_sep1 and L_sep2; PyTorch Lightning setup; VGG19 feature extractor.
Alec Kain
Dataset preparation; conceptualized separability loss mechanism; implemented CAPE, transformer encoders, and patching; explored L1 loss alternatives.
Tyler J. Church
Developed transformer and CNN decoders; managed hyperparameter testing; integrated data modules.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Robust Style Transfer with Transformers

Overview

Key Contributions

Architecture

Datasets

Setup & Usage

Results

Authors

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
data		data
model		model
LICENSE		LICENSE
README.md		README.md
Style_Transfer_ViT.pdf		Style_Transfer_ViT.pdf
requirements.tx		requirements.tx
training_pipeline.ipynb		training_pipeline.ipynb

License

akain0/StyleTransfer_ViT

Folders and files

Latest commit

History

Repository files navigation

Robust Style Transfer with Transformers

Overview

Key Contributions

Architecture

Datasets

Setup & Usage

Results

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages