A foray into neural networks for image and video

I sought to understand basic principles and approaches of modeling images and videos using pytorch. This is all elementary learnings and products, but there is an increase in complexity across the implementations moving from a simplistic image to video.

unet.ipynb

In this example, I conceived of a problem that is trivial for a neural network, but also where a statistical model may struggle -- completing a grid of binary values. I presented the neural network with training images such as below, with the performance evaluation being the quality of the predictions against the held-out portion of the image.

It was not difficult to obtain an architecture that could accomplish this task, even one as overwrought and facile as first attempted. I could conceive of ways to use statistical models to evaluate this but not with such generality; this is of course the benefit of neural networks.

I attempt to replicate a proper U-Net architecture at the end of this document.

nn_video.ipynb

The objective of the unet.ipynb document was to generate missing portions of an image. This provided lessons in the mechanics of data management, training, evaluation, and prediction.

The next objective -- that of nn_video.ipynb -- was to see how I could build a neural network that accepted many images as input, and produced a single image as output. These many images are temporally seequenced and dependent -- a video.

In this document, I simulate a "blob" that moves over space -- which is nothing more than a probability distribution in two dimensions -- and deposits "events" over the space conditional on the movement of the blob over time.

The task was to estimate the total amount of events given the blob's motion -- and devise data loaders and model architectures to handle this appropriately.

This document displays a first-attempt at this model architecture, highlights a roadblock, and includes some of the debugging efforts to locate the problem in the architecture.

nn_video_2.ipynb

This document -- largely without commentary -- takes the learnings from nn_video.ipynb and produces a model framework that I deemed acceptable in the context of the task. It generates a smoothly varying distribution of the "events" rather than noisy estimates of older implementations. Its major improvements were the use of BatchNorm3d, and a loss function that evaluated prediction quality at various output resolutions (instead of a single final output resolution at the end).

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
output_images		output_images
.gitignore		.gitignore
README.md		README.md
image_sim.R		image_sim.R
nn.py		nn.py
nn_video.ipynb		nn_video.ipynb
nn_video_2.ipynb		nn_video_2.ipynb
requirements.txt		requirements.txt
simulated.Rproj		simulated.Rproj
unet.ipynb		unet.ipynb
video_sim.py		video_sim.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A foray into neural networks for image and video

unet.ipynb

nn_video.ipynb

nn_video_2.ipynb

About

Releases

Packages

Languages

awong234/torch-for-image-and-video

Folders and files

Latest commit

History

Repository files navigation

A foray into neural networks for image and video

unet.ipynb

nn_video.ipynb

nn_video_2.ipynb

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages