🔮 TalesRunner: AI-powered Storybook Generation from Images

TL;DR

Built a multimodal storybook generation system that turns images into narrative PDFs

Combined BLIP image captioning with fine-tuned KoT5 for scene-level story generation

Processed 50,001 images with structured metadata for training and evaluation

Delivered an end-to-end pipeline with a Streamlit demo and PDF export

This repository presents a practical exploration of multimodal narrative generation, focusing on how visual information can be transformed into coherent, scene-level stories. Rather than treating image-to-text as a single-step task, the project decomposes storytelling into captioning, structured context integration, and Transformer-based text generation.

TalesRunner emphasizes pipeline design and system integration, demonstrating how pre-trained vision–language models and fine-tuned language models can be combined to produce user-facing, end-to-end AI applications.

Overview

TalesRunner is a full-stack AI project that transforms visual input into text-based stories. Users upload images (up to 10), optionally provide additional scene information, and the system automatically:

Generates captions using BLIP
Merges captions with structured metadata
Produces narrative paragraphs using a fine-tuned KoT5 language model
Compiles images + text into a PDF storybook

The project focuses on building a practical multimodal pipeline using pre-trained models, fine-tuned LLMs, and a user-friendly demo interface.

Project Timeline

Jan–Feb 2025 (5 weeks)

Week	Period	Focus & Milestones
1	Jan 11 – Jan 14	Project scoping, task definition, first ideation
2	Jan 16 – Jan 19	Second ideation, system design refinement
3	Jan 19 – Jan 26	Dataset construction, baseline LM review
4	Jan 27 – Feb 3	Dataset finalization, KoT5 fine-tuning
5	Feb 3 – Feb 10	KoT5 fine-tuning, inference pipeline implementation
6	Feb 10 – Feb 15	Inference demo, final integration and project wrap-up

Key Features

Multimodal story generation pipeline combining BLIP captions and KoT5 text generation
Structured metadata extraction from AI Hub annotations
Custom input format with special tokens to guide narrative generation
Fine-tuned KoT5 model with Bayesian hyperparameter optimization
Streamlit demo enabling interactive storybook creation
PDF export for final story compilation

System Architecture

High-level Flow

Image Upload User provides 1–10 images in order.
Captioning (BLIP) BLIP generates an initial natural-language caption for each image.
Metadata Integration User-provided fields + extracted annotations are combined with BLIP captions.
Story Generation (KoT5) Fine-tuned KoT5 outputs a paragraph for each scene.
PDF Assembly Images + story paragraphs compiled into a downloadable PDF.

Implementation Details

Dataset Construction

Source: AI Hub Fairy Tale Illustration Dataset (50,001 samples)
Each sample includes:
- an image (.jpg)
- a metadata file (.json)
BLIP generates captions for all images
Annotation fields extracted:
- Required: caption, name, i_action, classification
- Optional: character, setting, emotion, causality, outcome, prediction
Combined to create:
- dataset_train.csv
- dataset_val.csv

Input Encoding

Special tokens mark structured fields
Required fields validated for completeness
Optional fields replaced with <empty> if missing
Field order randomized per sample to prevent positional bias
Row-wise seed ensures reproducibility
Task prefix added to guide KoT5 generation

Model Training

Model Choices

Baseline models reviewed: KoGPT-2, KoT5
KoT5 selected due to stronger generalization and encoder–decoder flexibility

Tokenizer & Model Customization

Added special tokens to tokenizer vocab
Aligned embedding matrix with extended vocabulary
Applied masking so structural tokens do not affect attention scores

Training & Optimization

Hyperparameter search using Bayesian Optimization
Optimizer: AdamW
Scheduler: Warmup + Linear decay
Early stopping applied

Decoding Optimization

Best parameters during evaluation:

num_beams = 3
length_penalty = 0.8
repetition_penalty = 1.5
no_repeat_ngram_size = 3

Evaluation Metrics Used

BERTScore
METEOR
CIDEr
SPICE

KoT5 outperformed KoGPT2 in narrative quality, coherence, and content relevance.

Demo

A Streamlit demo provides an interactive interface for story generation.

Demo Features

Image upload page
Metadata auto-filling and keyword suggestion
Real-time inference using the fine-tuned KoT5 model
PDF generation

To run locally:

streamlit run app.py

GPU recommended due to reliance on pre-trained models.

Team

Doeun Kim — Dataset construction, KoT5 fine-tuning, model training & validation
Yujin Shin — Annotation preprocessing, inference pipeline, Streamlit demo
Junga Woo — Baseline model experiments (KoGPT/KoT5), decoding parameter search
Soobin Cha (PM) — Project management, model baselines, inference UI

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
koT5		koT5
streamlit		streamlit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔮 TalesRunner: AI-powered Storybook Generation from Images

TL;DR

Table of Contents

Overview

Project Timeline

Key Features

System Architecture

High-level Flow

Implementation Details

Dataset Construction

Input Encoding

Model Training

Model Choices

Tokenizer & Model Customization

Training & Optimization

Decoding Optimization

Evaluation Metrics Used

Demo

Demo Features

Team

About

Uh oh!

Releases

Packages

Languages

doeunyy/tales-runner

Folders and files

Latest commit

History

Repository files navigation

🔮 TalesRunner: AI-powered Storybook Generation from Images

TL;DR

Table of Contents

Overview

Project Timeline

Key Features

System Architecture

High-level Flow

Implementation Details

Dataset Construction

Input Encoding

Model Training

Model Choices

Tokenizer & Model Customization

Training & Optimization

Decoding Optimization

Evaluation Metrics Used

Demo

Demo Features

Team

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages