Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Files

Latest commit

1021dca · Dec 16, 2020

History

History
37 lines (28 loc) · 2.01 KB

README.md

File metadata and controls

37 lines (28 loc) · 2.01 KB

Detecting Distribution Shift in Speech Audio

This repo contains tools for doing covariate shift detection on speech audio, and an example application on the VOiCES dataset. Our approach uses pretrained speech featurizers and aggregation methods to embed waveforms into fixed length vectors.

Embedding pipeline

We then use the approach of Failing Loudly to detect distribution shift, by first using untrained autoencoders for dimensionality reduction on the embeddings, followed up by two-sample non-parametric hypothesis testing on samples of source and target data.

Contents

  • embeddors.py: Classes wrapping models that convert waveforms to sequences of feature vectors.
  • aggregators.py: Classes for aggregrating sequences of feature vectors into single, fixed-length vector embeddings.
  • data_utils.py: Tools for combining featurizors and aggregators into the embedding pipeline and applying it to lists of waveforms or .wav files.
  • detection_utils.py: Functional wrappers around the alibi_detect implementation of MMD two sample testing for single and repeated tests.
  • hypothesis_test.ipynb: This notebook walks through loading the VOiCES dataset, preprocessing the waveforms into embeddings, and evaluating the performance of distribution shift detection.

Dependencies and prerequisites

To follow the experiments in hypothesis_test.ipynb you will need to download the VOiCES dataset. Instructions for that can be found here. We recommend using the devkit subset for expediency.

The following packages are required

alibi-detect
torch
fairseq
tensorflow
tensorflow_hub
librosa
pandas
seaborn
wget

All can be installed using pip

pip install -r requirements.txt