diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/.gitignore b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/.gitignore
deleted file mode 100644
index edab6bd..0000000
--- a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/.gitignore
+++ /dev/null
@@ -1,16 +0,0 @@
-.ipynb_checkpoints/
-
-evaluation_test/__pycache__
-evaluation_test/gsoc_application.pdf
-evaluation_test/data/*.pt*
-
-pose_extraction/ilya_poses
-pose_extraction/.ipynb_checkpoints
-pose_extraction/3d_pose_extraction/*.json
-pose_extraction/3d_pose_extraction/test/vis/
-
-models/.ipynb_checkpoints/
-models/__pycache__/
-models/NRI_particles_test_*
-models/graphs
-models/best_weights/nri_particles*
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/README.md b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/README.md
deleted file mode 100644
index b3760c9..0000000
--- a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/README.md
+++ /dev/null
@@ -1,64 +0,0 @@
-# AI-Generated Choreography - from Solos to Duets
-
-This repository is dedicated to the development of the project [AI-Generated Choreography - from Solos to Duets](https://humanai.foundation/gsoc/2024/proposal_ChoreoAI1.html). Here, you will find all the documentation and code for implementing my pipelines.
-
-Special thanks to my supervisors, [Mariel Pettee](https://marielpettee.com/) and [Ilya Vidrin](https://www.ilyavidrin.com/), for all their guidance, and to my work partner, Zixuan Wang, for developing [her pipeline](https://github.com/humanai-foundation/ChoreoAI/tree/main/ChoreoAI_Zixuan_Wang) alongside mine.
-
-If you don't want to dive into the code right away but rather want to have an overview of the entire project, check my [blog posts on Medium](https://medium.com/@luisvz).
-
-## Duet ChorAIgraphy: Using GNNs to Study Duets
-
-
-
-
-
-Duet ChorAIgraphy aims to implement a pipeline using Graph Neural Networks (GNNs) to study dance duets. The project focuses on **Interpretability of Movements:** a pipeline that learns about the connection between the dancers' bodies in different dance sequences.
-
-
-
-The pipeline is discussed in more detail below, along with a presentation of key results. For a more comprehensive explanation of the model, refer to the `models` directory, which contains its implementations as well as complete documentation on it.
-
-## Repository Hierarchy
-
-- The root of the repository naturally provides an overview of the project and showcases sample generations from the developed model.
-
-- The `evaluation_test` folder contains the application for the contributor position. It includes a `README` file with detailed information on the implementations for the selection process test, along with the results obtained. The folder also contains the notebook with the source code for the developed components. The `README` was later expanded as the project development continued.
-
-- The `pose_extraction` folder contains details about the pose extraction pipeline used to create the project dataset. It includes an installation process guide, the raw data used, and information on the development and execution of the idea.
-
-- The `models` folder contains the project's core, including installation instructions, pipelines designs, and running guidelines for the interpretability of movements model. It also has detailed results for a thorough evaluation on the quality of the final agent.
-
-## Sample Generations
-
-The project has produced a wide range of results. Its creative and subjective nature offers many opportunities for exploration, even in the most unexpected outputs. However, while this subjectivity allows for different perspectives, it also creates challenges in getting a more precise evaluation of the model. Below are examples of connections (both undirected and directed) computed by the model in various scenarios. For a deeper study about what the model is doing, please check the [documentation under `models`](https://github.com/Luizerko/ai_choreo/blob/master/models/README.md).
-
-
-
-
-
- Example of the sampled edge distribution. The black edges represent connections between the dancers, with darker edges indicating higher confidence in their importance for reconstruction. In this typical case, for 6 particles sampled, 3 edges were selected, two with slightly higher importance, though all show a high confidence level (above 80%).
-
- Examples of multiple edges connected to the same particle. On the left, an undirected example shows simpler movement for a sampling of 6 particles, while on the right, a directed example features more complex movement for a sampling of 10 particles. It is worth noting that, for the animation on the right, the model recognizes how one dancer's feet motion influence various parts of the other dancer's body, which aligns with the movement being performed, where the dynamic spin of the blue dancer guides the red dancer's response.
-
- Examples of connections within opposition tendencies. On the left, the undirected example shows multiple connections between the lower torso of both dancers, first leaning in opposite directions and then gravitating toward each other, illustrating the full range of the stretched-string analogy. This example is notable for the model’s emphasis on opposition by sampling many edges for the reconstruction of these sequences. On the right, in the simpler directed example, one can see that the edges connecting particles moving apart are stronger than those linking particles moving closer together (as seen in the blue dancer's hand).
-
- Two final examples are presented not for specific details, but to highlight and appreciate interesting visualizations generated by the model.
-
-
-
-For a quick and easy way to generate and explore new visualizations, check out the [Colab Notebook](https://colab.research.google.com/drive/1KhX-Ppn9-BxAO4EX0BtfqPohdA09I6z5?usp=sharing).
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/beauty_1.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/beauty_1.gif
deleted file mode 100644
index 93d0515..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/beauty_1.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/beauty_2.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/beauty_2.gif
deleted file mode 100644
index 2a00e53..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/beauty_2.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/duet_choraigraphy_logo.png b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/duet_choraigraphy_logo.png
deleted file mode 100644
index 1f1b5fc..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/duet_choraigraphy_logo.png and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/possible_logo.png b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/possible_logo.png
deleted file mode 100644
index 66e3c2e..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/assets/possible_logo.png and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/README.md b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/README.md
deleted file mode 100644
index bc2b3f7..0000000
--- a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/README.md
+++ /dev/null
@@ -1,174 +0,0 @@
-# Evaluation Test Report
-
-This is a file containing all the information regarding the development of the evaluation test.
-
-## Setup
-
-Before using the notebook, you should setup your environment:
-
-```
-conda create -n python=3.10
-```
-
-After creating the conda environment, just install all the dependencies by running:
-
-```
-pip install -r requirements.txt
-```
-
-Alternatively, if the dependency versions don't match your system (very likely because of the CUDA version), you can manually install all the core packages found in `requirements.in`. They should also require you to install compatible versions of all the packages contained in `requirements.txt`.
-
-## Loading Data
-
-This is the pre-test part of the project that consists of replicating [Mariel's code](https://github.com/mariel-pettee/choreo-graph/blob/main/functions/load_data.py) to properly load and preprocess the [provided data](https://github.com/mariel-pettee/choreo-graph/tree/main/data). In here you can find code to load data, put everything in a very handable data structure and format, preprocess the joint positions so that they belong to the same unit cube (since we are interested in relative motion instead of absolute motion), and finally compute the edges.
-
-## Visualizing Dance
-
-This is the first part of the test, in which I effectively started developing. In here I instantiated the `MarielDataset` class, experimented for quite a while with the data to understand what each part actually represented, then built up a static visualization scheme to make sure everything was in order and finally animated a sequence from the original dataset.
-
-Note: I did not include the experimentation parts to this section of the notebook because I didn't want to make it even longer.
-
-
-
-
-
-
- Visualizing a sequence from the original dataset.
-
-
-
-## Training Generative Model
-
-This is the second part of the test and the most difficult one. To make all the descriptions more clear, I separate them into different sections:
-
- ### Implementation
-
- I decided to go for the LSTM-VAE model. This decision was based in two main reasons:
-
-- I wanted to replicate the ideas used in the [provided paper](https://arxiv.org/pdf/1907.05297.pdf). I understood that they resulted in a good model as described in the paper and also that I could try to use the given hyperparameters, making the search space for optimization much easier. Since optimizing NNs can often prove to be quite a challenge, I thought it would be a good idea considering the time schedule.
-
-- I have much more experience with LSTM than with GNNs, so I thought I should stick to models I'm more familiar with because of time limitations. I figured I could explore more about GNN models within the development of the real project if I get accepted.
-
-### Architecture and Optimization
-
-- One encoder with 3 LSTM layers (384 nodes) and 2 separated branches of linear layers (256 nodes each for the latent space), one for the mean and another one for log-variance.
-
-- One decoder with 1 linear layer (384 nodes) with ReLU activation function for the latent-space sampled data and 3 LSTM layers (159 nodes for the output).
-
-The model was trained with Adam optimizer for 200 epochs with early stopping at 3 validation losses higher than the best validation loss at that point. I also used the KL-divergence weight provided in the paper (0.0001). Finally, I added 0.2 dropout for the LSTM layers for some more regularization.
-
-I expanded the dataset using data noise augmentation. I got around 10000 random sequences out of the almost 40000 provided sequences, and added 0.01 scaled Gaussian noise to the joint coordinates to try and make the model a bit more robust. I wanted to do even more augmentation, but due to my GPU limitations, this was the best I could do. Finally, I used 90% of the data for training and 10% for validation, both randomly sampled from the dataset and shuffled afterwards, with a batch sizes of 64.
-
-### Comments and Results
-
-Even though I had reduced a lot the hyperparamter space by trying to replicate the provided paper, I still ended up having to train the model multiple times to find out the best hyperparameters.
-
-Furthermore, I had issues with the validation loss that made me replicate the experiments an enormous amount of times. I had a decreasing validation loss, as expected, but still orders of magnitude larger than the training loss. I think this problem is mostly related to the model being a bit to complex for the amount of data I had ($\frac{2}{3}$ of the data size from the paper with much less augmentation due to GPU limitations). I tried reducing the amount of LSTM layers to make the model simpler, but found out it was not really capable of capturing the complexity of dance sequences, mostly generating sequences in which the figure stands almost still. The image below shows the loss curves for training and validation datasets throughout the training process:
-
-
-
-
-
-
-
-
-
-A better solution I could think of was to reduce the sequence length drastically (64 instead of 128) to both expand the dataset and make the sequences much more simple to learn. It indeed reduced the validation loss quite a lot, but generated very bad sequences (almost random joint positions and movement). In the end I did not have the time to properly evaluate all the hyperparameters possibilities for this reduced sequence model and went back to the original sequence lengths implementation that had much better results at least.
-
-Even with all these issues, I managed to train the model and come up with some very interesting results. Some sequences from the original dataset are accurately reconstructed by the model. Others, even if not perfectly reconstructed, still clearly show that the model was able to capture the essence of their movements. A sequence that rotates, for example, remains rotating in its reconstructed version, or a sequence that lifts its leg remains with this movement in the reconstruction as well.
-
-When it comes to generating new sequences, the model is quite sensitive to the standard deviation used. When the latent space is sampled with a normal distribution, the model generates interesting sequences, but with fewer movements than the original sequences. When the latent space is sampled with a higher standard deviation, the sequence tends to be more creative, but it is also common to see joints getting lost in space (many points converging to the same coordinates or points moving shakily).
-
-Finally, one behavior I did not manage to fix was the initial state of the joints. Even in the best reconstructed/generated sequences, the joints start in weird positions, making the first miliseconds of the animation almost glitch to proper positions and then start a proper sequence of movements.
-
-In the GIFs below I show some of the obtained results.
-
-
-
-
-
-
-
- Orignal sequence.
-
-
-
- Reconstructed sequence.
-
-
-
-
-
-
-
-
-
-
-
- Original sequence that doesn't reconstruct well.
-
-
-
- Not so good reconstructed sequence. The core movements are captured by the reconstruction though.
-
-
-
-
-
-
-
-
-
-
-
- Generated sequence.
-
-
-
- My interpretation of the generated sequence.
-
-
-
-
-
-
-
-
-
-
-
- Another generated sequence.
-
-
-
- My interpretation of another generated sequence.
-
-
-
-
-
-
-
-
-
-
-
- Generated sequence with a bit larger standard deviation ($1.5\sigma^2$).
-
-
-
- Generated sequence with even larger standard deviation ($2\sigma^2$).
-
-
-
-
-
-## Why This Project?
-
-Growing up in the northeast of Brazil, art and communication were central to my life. Surrounded by the richness of Brazilian music and dance from a young age, I quickly connected to the arts. While I'm not as skilled as many dancers, I strongly believe in the power of dance to bring warmth to any environment and I always engage in it with joy. My talents, though, are more related to communication, making me a natural-born chatterbox, always wanting to learn from others, and share parts of my own journey. I think the mixture of my cultural background along with my academic and professional trajectory is exactly what connects me to this project and truly makes me want to be a part of it.
-
-Now talking about approaches to the project, I would first focus on building the dataset. I would use the same methods employed in the provided paper to preprocess the dance sequences, but now separating the joints of the two dancers into different groups and encoding the interaction between nodes from the two. Focusing on the proposed methods, we could:
-
-- Draw and edge between joints from different dancers that have some common properties. Very close nodes with either same or opposite velocity vectors, and nodes in symmetric positions are examples of what these properties could be. This form of encoding fits perfectly a GNN and could be used to help the model understand the sequences while capturing the relationship between dancers.
-
-- Use the same approach for pairs of nodes computation as before, but rather than encoding pairs with an edge, encoding them into a dance sequence by a special connection token. These tokens could be used by a transformer model to capture the relationship between nodes over time, understanding a sequence as combination of interactive joint pairs.
\ No newline at end of file
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/bad_original_seq.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/bad_original_seq.gif
deleted file mode 100644
index 003127e..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/bad_original_seq.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/bad_recon_seq.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/bad_recon_seq.gif
deleted file mode 100644
index 406ed9d..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/bad_recon_seq.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq.gif
deleted file mode 100644
index cf8ad3d..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq2.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq2.gif
deleted file mode 100644
index dffe7a2..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq2.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq2_me.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq2_me.gif
deleted file mode 100644
index c8b8605..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq2_me.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq_2std.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq_2std.gif
deleted file mode 100644
index 28bcc4f..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq_2std.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq_me.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq_me.gif
deleted file mode 100644
index 8b796a8..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq_me.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq_shaky.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq_shaky.gif
deleted file mode 100644
index 44cb28a..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/generated_seq_shaky.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/loss_graphs.png b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/loss_graphs.png
deleted file mode 100644
index fec126a..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/loss_graphs.png and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/original_seq.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/original_seq.gif
deleted file mode 100644
index a276cb8..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/original_seq.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/recon_seq.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/recon_seq.gif
deleted file mode 100644
index 3bd5d8a..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/recon_seq.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/visualizing_sequence.gif b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/visualizing_sequence.gif
deleted file mode 100644
index d2bd575..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/assets/visualizing_sequence.gif and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/best_model.pth b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/best_model.pth
deleted file mode 100644
index 0f9a809..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/best_model.pth and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_betternot_and_retrograde.npy b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_betternot_and_retrograde.npy
deleted file mode 100644
index 29058fc..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_betternot_and_retrograde.npy and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_beyond.npy b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_beyond.npy
deleted file mode 100644
index 5a1f829..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_beyond.npy and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_chunli.npy b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_chunli.npy
deleted file mode 100644
index d2c23e4..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_chunli.npy and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_honey.npy b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_honey.npy
deleted file mode 100644
index a5963b4..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_honey.npy and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_knownbetter.npy b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_knownbetter.npy
deleted file mode 100644
index 28d11af..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_knownbetter.npy and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_penelope.npy b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_penelope.npy
deleted file mode 100644
index d6f62a9..0000000
Binary files a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/data/mariel_penelope.npy and /dev/null differ
diff --git a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/evaluation_test.ipynb b/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/evaluation_test.ipynb
deleted file mode 100644
index d2458a4..0000000
--- a/ChoreoAI_Duet_ChorAIgraphy_Luis_Zerkowski/evaluation_test/evaluation_test.ipynb
+++ /dev/null
@@ -1,460406 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "87397e08-8c81-4d5f-ba76-77fb9d35423d",
- "metadata": {},
- "source": [
- "# Loading Data"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "73a7892e-e6f6-47c7-b15e-f21ccdb5682f",
- "metadata": {},
- "source": [
- "
This is the pre-test part of the project that consists of replicating [Mariel's code](https://github.com/mariel-pettee/choreo-graph/blob/main/functions/load_data.py) to properly load and preprocess the [provided data](https://github.com/mariel-pettee/choreo-graph/tree/main/data). In here you can find code to load data, put everything in a very handable data structure and format, preprocess the joint positions so that they belong to the same unit cube (since we are interested in relative motion instead of absolute motion), and finally compute the edges.
"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "a09d7a1a-638f-4244-87fa-c56420cfd6c3",
- "metadata": {},
- "outputs": [],
- "source": [
- "import torch\n",
- "from torch_geometric.data import Data\n",
- "import numpy as np\n",
- "from glob import glob\n",
- "import os"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "id": "b59d503a-91a3-4c15-9b87-4774e7f19c17",
- "metadata": {},
- "outputs": [],
- "source": [
- "point_labels = ['ARIEL','C7','CLAV','LANK','LBHD','LBSH','LBWT','LELB','LFHD','LFRM','LFSH','LFWT','LHEL','LIEL','LIHAND','LIWR','LKNE','LKNI','LMT1','LMT5','LOHAND','LOWR','LSHN','LTHI','LTOE','LUPA','MBWT','MFWT','RANK','RBHD','RBSH','RBWT','RELB','RFHD','RFRM','RFSH','RFWT','RHEL','RIEL','RIHAND','RIWR','RKNE','RKNI','RMT1','RMT5','ROHAND','ROWR','RSHN','RTHI','RTOE','RUPA','STRN','T10']\n",
- "\n",
- "reduced_joint_names = ['ARIEL','CLAV','RFSH','LFSH','RIEL','LIEL','RIWR','LIWR','RKNE','LKNE','RTOE','LTOE','LHEL','RHEL','RFWT','LFWT','LBWT','RBWT']\n",
- "\n",
- "skeleton_lines = [\n",
- "# ( (start group), (end group) ),\n",
- " (('LHEL',), ('LTOE',)), # toe to heel\n",
- " (('RHEL',), ('RTOE',)),\n",
- " (('LMT1',), ('LMT5',)), # horizontal line across foot\n",
- " (('RMT1',), ('RMT5',)), \n",
- " (('LHEL',), ('LMT1',)), # heel to sides of feet\n",
- " (('LHEL',), ('LMT5',)),\n",
- " (('RHEL',), ('RMT1',)),\n",
- " (('RHEL',), ('RMT5',)),\n",
- " (('LTOE',), ('LMT1',)), # toe to sides of feet\n",
- " (('LTOE',), ('LMT5',)),\n",
- " (('RTOE',), ('RMT1',)),\n",
- " (('RTOE',), ('RMT5',)),\n",
- " (('LKNE',), ('LHEL',)), # heel to knee\n",
- " (('RKNE',), ('RHEL',)),\n",
- " (('LFWT',), ('RBWT',)), # connect pelvis\n",
- " (('RFWT',), ('LBWT',)), \n",
- " (('LFWT',), ('RFWT',)), \n",
- " (('LBWT',), ('RBWT',)),\n",
- " (('LFWT',), ('LBWT',)), \n",
- " (('RFWT',), ('RBWT',)), \n",
- " (('LFWT',), ('LTHI',)), # pelvis to thighs\n",
- " (('RFWT',), ('RTHI',)), \n",
- " (('LBWT',), ('LTHI',)), \n",
- " (('RBWT',), ('RTHI',)), \n",
- " (('LKNE',), ('LTHI',)), \n",
- " (('RKNE',), ('RTHI',)), \n",
- " (('CLAV',), ('LFSH',)), # clavicle to shoulders\n",
- " (('CLAV',), ('RFSH',)), \n",
- " (('STRN',), ('LFSH',)), # sternum & T10 (back sternum) to shoulders\n",
- " (('STRN',), ('RFSH',)), \n",
- " (('T10',), ('LFSH',)), \n",
- " (('T10',), ('RFSH',)), \n",
- " (('C7',), ('LBSH',)), # back clavicle to back shoulders\n",
- " (('C7',), ('RBSH',)), \n",
- " (('LFSH',), ('LBSH',)), # front shoulders to back shoulders\n",
- " (('RFSH',), ('RBSH',)), \n",
- " (('LFSH',), ('RBSH',)),\n",
- " (('RFSH',), ('LBSH',)),\n",
- " (('LFSH',), ('LUPA',),), # shoulders to upper arms\n",
- " (('RFSH',), ('RUPA',),), \n",
- " (('LBSH',), ('LUPA',),), \n",
- " (('RBSH',), ('RUPA',),), \n",
- " (('LIWR',), ('LIHAND',),), # wrist to hand\n",
- " (('RIWR',), ('RIHAND',),),\n",
- " (('LOWR',), ('LOHAND',),), \n",
- " (('ROWR',), ('ROHAND',),),\n",
- " (('LIWR',), ('LOWR',),), # across the wrist \n",
- " (('RIWR',), ('ROWR',),), \n",
- " (('LIHAND',), ('LOHAND',),), # across the palm \n",
- " (('RIHAND',), ('ROHAND',),), \n",
- " (('LFHD',), ('LBHD',)), # draw lines around circumference of the head\n",
- " (('LBHD',), ('RBHD',)),\n",
- " (('RBHD',), ('RFHD',)),\n",
- " (('RFHD',), ('LFHD',)),\n",
- " (('LFHD',), ('ARIEL',)), # connect circumference points to top of head\n",
- " (('LBHD',), ('ARIEL',)),\n",
- " (('RBHD',), ('ARIEL',)),\n",
- " (('RFHD',), ('ARIEL',)),\n",
- "]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "id": "b6653ad1-51f2-42dc-9760-3c410ccd0632",
- "metadata": {},
- "outputs": [],
- "source": [
- "class MarielDataset(torch.utils.data.Dataset):\n",
- " 'Characterizes a dataset for PyTorch'\n",
- " def __init__(self, reduced_joints=False, xy_centering=True, seq_len=128, predicted_timesteps=1, file_path=\"data/mariel_*.npy\", no_overlap=False):\n",
- " 'Initialization'\n",
- " self.file_path = file_path\n",
- " self.seq_len = seq_len\n",
- " self.no_overlap = no_overlap\n",
- " self.reduced_joints = reduced_joints # use a meaningful subset of joints\n",
- " self.data = load_data(pattern=file_path) \n",
- " self.xy_centering = xy_centering\n",
- " self.n_joints = 53\n",
- " self.n_dim = 6\n",
- " self.predicted_timesteps = predicted_timesteps\n",
- " \n",
- " print(\"\")\n",
- " \n",
- " if self.no_overlap == True:\n",
- " print(\"Generating non-overlapping sequences...\") \n",
- " else:\n",
- " print(\"Generating overlapping sequences...\")\n",
- " \n",
- " if self.xy_centering == True: \n",
- " print(\"Using (x,y)-centering...\")\n",
- " else: \n",
- " print(\"Not using (x,y)-centering...\")\n",
- " \n",
- " if self.reduced_joints == True: \n",
- " print(\"Reducing joints...\")\n",
- " else:\n",
- " print(\"Using all joints...\")\n",
- "\n",
- " def __len__(self):\n",
- " 'Denotes the total number of samples'\n",
- " if self.xy_centering: \n",
- " data = self.data[1] # choose index 1, for the (x,y)-centered phrases\n",
- " else: \n",
- " data = self.data[0] # choose index 0, for data without (x,y)-centering\n",
- " \n",
- " if self.no_overlap == True:\n",
- " # number of complete non-overlapping phrases\n",
- " return int(len(data)/self.seq_len)\n",
- " else:\n",
- " # number of overlapping phrases up until the final complete phrase\n",
- " return len(data)-self.seq_len \n",
- "\n",
- " def __getitem__(self, index):\n",
- " 'Generates one sample of data' \n",
- " edge_index, is_skeleton_edge, reduced_joint_indices = edges(reduced_joints=self.reduced_joints, seq_len=self.seq_len)\n",
- " \n",
- " if self.xy_centering == True: \n",
- " data = self.data[1] # choose index 1, for the (x,y)-centered phrases\n",
- " else: \n",
- " data = self.data[0] # choose index 0, for data without (x,y)-centering\n",
- "\n",
- " if self.reduced_joints == True: \n",
- " data = data[:,reduced_joint_indices,:] # reduce number of joints if desired\n",
- " \n",
- " if self.no_overlap == True: \n",
- " # non-overlapping phrases\n",
- " index = index*self.seq_len\n",
- " sequence = data[index:index+self.seq_len]\n",
- " prediction_target = data[index:index+self.seq_len+self.predicted_timesteps]\n",
- " else: \n",
- " # overlapping phrases\n",
- " sequence = data[index:index+self.seq_len]\n",
- " prediction_target = data[index:index+self.seq_len+self.predicted_timesteps]\n",
- "\n",
- " sequence = np.transpose(sequence, [1,0,2]) # put n_joints first\n",
- " sequence = sequence.reshape((data.shape[1],self.n_dim*self.seq_len)) # flatten n_dim*seq_len into one dimension (i.e. node feature)\n",
- " prediction_target = np.transpose(prediction_target, [1,0,2]) # put n_joints first\n",
- " prediction_target = prediction_target.reshape((data.shape[1],self.n_dim*(self.seq_len+self.predicted_timesteps))) \n",
- "\n",
- " # Convert to torch objects\n",
- " sequence = torch.Tensor(sequence)\n",
- " prediction_target = torch.Tensor(prediction_target)\n",
- " edge_attr = torch.Tensor(is_skeleton_edge)\n",
- " \n",
- " return Data(x=sequence, y=prediction_target, edge_index=edge_index.t().contiguous(), edge_attr=edge_attr)\n",
- "\n",
- "def load_data(pattern=\"data/mariel_*.npy\"):\n",
- " # load up the six datasets, performing some minimal preprocessing beforehand\n",
- " datasets = {}\n",
- " ds_all = []\n",
- " \n",
- " exclude_points = [26,53]\n",
- " point_mask = np.ones(55, dtype=bool)\n",
- " point_mask[exclude_points] = 0\n",
- " \n",
- " for f in sorted(glob(pattern)):\n",
- " ds_name = os.path.basename(f)[7:-4]\n",
- " ds = np.load(f).transpose((1,0,2))\n",
- " ds = ds[500:-500, point_mask]\n",
- " ds[:,:,2] *= -1\n",
- " datasets[ds_name] = ds\n",
- " ds_all.append(ds)\n",
- "\n",
- " ds_counts = np.array([ds.shape[0] for ds in ds_all])\n",
- " ds_offsets = np.zeros_like(ds_counts)\n",
- " ds_offsets[1:] = np.cumsum(ds_counts[:-1])\n",
- "\n",
- " ds_all = np.concatenate(ds_all)\n",
- " print(\"Original numpy dataset contains {:,} timesteps of {} joints with {} dimensions each.\".format(ds_all.shape[0], ds_all.shape[1], ds_all.shape[2]))\n",
- "\n",
- " low,hi = np.quantile(ds_all, [0.01,0.99], axis=(0,1))\n",
- " xy_min = min(low[:2])\n",
- " xy_max = max(hi[:2])\n",
- " xy_range = xy_max-xy_min\n",
- " ds_all[:,:,:2] -= xy_min\n",
- " ds_all *= 2/xy_range\n",
- " ds_all[:,:,:2] -= 1.0\n",
- "\n",
- " ### It's also useful to have these datasets centered, i.e. with the x and y offsets subtracted from each individual frame:\n",
- " ds_all_centered = ds_all.copy()\n",
- " ds_all_centered[:,:,:2] -= ds_all_centered[:,:,:2].mean(axis=1,keepdims=True)\n",
- "\n",
- " datasets_centered = {}\n",
- " for ds in datasets:\n",
- " datasets[ds][:,:,:2] -= xy_min\n",
- " datasets[ds] *= 2/xy_range\n",
- " datasets[ds][:,:,:2] -= 1.0\n",
- " datasets_centered[ds] = datasets[ds].copy()\n",
- " datasets_centered[ds][:,:,:2] -= datasets[ds][:,:,:2].mean(axis=1,keepdims=True)\n",
- " \n",
- " ### Calculate velocities (first velocity is always 0)\n",
- " velocities = np.vstack([np.zeros((1,53,3)),np.array([35*(ds_all[t+1,:,:] - ds_all[t,:,:]) for t in range(len(ds_all)-1)])]) # (delta_x/y/z per frame) * (35 frames/sec)\n",
- " \n",
- " ### Stack positions above velocities\n",
- " ds_all = np.dstack([ds_all,velocities]) # stack along the 3rd dimension, i.e. \"depth-wise\"\n",
- " ds_all_centered = np.dstack([ds_all_centered,velocities]) # stack along the 3rd dimension, i.e. \"depth-wise\"\n",
- "\n",
- " for data in [ds_all, ds_all_centered]:\n",
- " # Normalize locations & velocities (separately) to [-1, 1]\n",
- " loc_min = np.min(data[:,:,:3])\n",
- " loc_max = np.max(data[:,:,:3])\n",
- " vel_min = np.min(data[:,:,3:])\n",
- " vel_max = np.max(data[:,:,3:])\n",
- " print(\"loc_min:\",loc_min,\"loc_max:\",loc_max)\n",
- " print(\"vel_min:\",vel_min,\"vel_max:\",vel_max)\n",
- " data[:,:,:3] = (data[:,:,:3] - loc_min) * 2 / (loc_max - loc_min) - 1\n",
- " data[:,:,3:] = (data[:,:,3:] - vel_min) * 2 / (vel_max - vel_min) - 1\n",
- " \n",
- " return ds_all, ds_all_centered, datasets, datasets_centered, ds_counts\n",
- "\n",
- "def edges(reduced_joints, seq_len):\n",
- " ### Define a subset of joints if we want to train on fewer joints that still capture meaningful body movement:\n",
- " if reduced_joints == True:\n",
- " reduced_joint_indices = [point_labels.index(joint_name) for joint_name in reduced_joint_names]\n",
- " edge_index = np.array([(i,j) for i in reduced_joint_indices for j in reduced_joint_indices if i!=j])\n",
- " else:\n",
- " reduced_joint_indices = None\n",
- " edge_index = np.array([(i,j) for i in range(53) for j in range(53) if i!=j]) # note: no self-loops!\n",
- "\n",
- " skeleton_idxs = []\n",
- " for g1,g2 in skeleton_lines:\n",
- " entry = []\n",
- " entry.append([point_labels.index(l) for l in g1][0])\n",
- " entry.append([point_labels.index(l) for l in g2][0])\n",
- " skeleton_idxs.append(entry)\n",
- " \n",
- " is_skeleton_edge = [] \n",
- " for edge in np.arange(edge_index.shape[0]): \n",
- " if [edge_index[edge][0],edge_index[edge][1]] in skeleton_idxs: \n",
- " is_skeleton_edge.append(torch.tensor(1.0))\n",
- " else:\n",
- " is_skeleton_edge.append(torch.tensor(0.0))\n",
- " \n",
- " is_skeleton_edge = np.array(is_skeleton_edge)\n",
- " copies = np.tile(is_skeleton_edge, (seq_len,1)) # create copies of the 1D array for every timestep\n",
- " skeleton_edges_over_time = torch.tensor(np.transpose(copies))\n",
- " \n",
- " if reduced_joints == True: \n",
- " ### Need to remake these lists to include only nodes 0-18 now\n",
- " edge_index = np.array([(i,j) for i in np.arange(len(reduced_joint_indices)) for j in np.arange(len(reduced_joint_indices)) if i!=j])\n",
- " \n",
- " return torch.tensor(edge_index, dtype=torch.long), skeleton_edges_over_time, reduced_joint_indices"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "09a4c628-6817-499b-bdca-9c15a7b17341",
- "metadata": {},
- "source": [
- "# Visualizing Dance"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2d03d608-79b7-4138-b221-eede8ba3c721",
- "metadata": {},
- "source": [
- "
This is the first part of the test, in which I effectively started developing. In here I instantiated the `MarielDataset` class, experimented for quite a while with the data to understand what each part actually represented, then built up a static visualization scheme to make sure everything was in order and finally animated a sequence from the original dataset.\n",
- "\n",
- "Note: I did not include the experimentation parts to this section of the notebook because I didn't want to make it even longer.