transcript.fish

Unofficial No Such Thing As A Fish episode transcripts.

Running web app locally

Install js dependencies

Run npm install

Start dev server

Run npm start

Load locally running app

Open http://localhost:5173/?deployed=true to load all assets from remote hosts. (Try this first)

OR

Open http://localhost:5173/ to use local assets.

To download episodes from the RSS feed, transcribe them, and add them to the database

TODO: Add instructions for creating database with migrations.sql

Create virtual env

Run python -m venv venv

Activate virtual env

Run source venv

Install python dependencies

Run pip install -r requirements.txt

Download and transcribe a specific episode by number

Run npm run convert 146

Download and transcribe all untranscribed episodes

Run npm run convert

Warning: This will take a long time

NOTE The first time this script is run, it needs to download the Whisper model, which requires local_files_only to be temporarily set to False. After this, the option can be changed back to True.

Changing whisper model size for faster but less accurate performance

In whisper.py change model_size to your preferred model. See available models.

NOTE: By default this uses the large-v2 Whisper model. On an M1 Mac with 64GB of RAM this transcribes at about 1x speed. This means an hour long episode gets transcribed in about an hour.

So, as of 8 February 2025:

select sum(duration) from episodes
-- 1555237

   1,555,237.0 seconds
÷         60.0 seconds
÷         60.0 minutes
÷         24.0 hours
-----------------------
=         18.0 days

The good news is changing to the medium.en, small.en, or tiny.en model increases this speed dramatically but the accuracy goes down. small.en transcribes at about 3x speed, for example.

The other good news is that the convert script is idempotent in that you can kill the script (Ctrl + C) and restart it at any time and it will pick back up after the last fully transcribed episode. You can safely run this script over and over without creating any duplicates.

NOTE: This script also downloads all the audio files for the episodes as well as each episode's album art. As of 8 February 2025 this amounts to 568 episodes, ~24.2GB audio, ~190MB images.

Split database into chunks

Run npm run split

Deploying

Sync database, audio, images, and fonts to (Cloudflare) R2

Needs rclone and jq installed.

Run npm run sync

Name		Name	Last commit message	Last commit date
Latest commit History 276 Commits
.github		.github
.husky		.husky
db		db
fonts		fonts
images		images
scripts		scripts
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierrc		.prettierrc
.python-version		.python-version
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

transcript.fish

Running web app locally

Install js dependencies

Start dev server

Load locally running app

To download episodes from the RSS feed, transcribe them, and add them to the database

Create virtual env

Activate virtual env

Install python dependencies

Download and transcribe a specific episode by number

Download and transcribe all untranscribed episodes

Changing whisper model size for faster but less accurate performance

Split database into chunks

Deploying

Sync database, audio, images, and fonts to (Cloudflare) R2

About

Contributors 3

Languages

noman-land/transcript.fish

Folders and files

Latest commit

History

Repository files navigation

transcript.fish

Running web app locally

Install js dependencies

Start dev server

Load locally running app

To download episodes from the RSS feed, transcribe them, and add them to the database

Create virtual env

Activate virtual env

Install python dependencies

Download and transcribe a specific episode by number

Download and transcribe all untranscribed episodes

Changing whisper model size for faster but less accurate performance

Split database into chunks

Deploying

Sync database, audio, images, and fonts to (Cloudflare) R2

About

Topics

Resources

Stars

Watchers

Forks

Contributors 3

Languages