Timestamps in `parakeet_runner` #16545

mattjcly · 2026-01-12T17:33:19Z

Summary

Enable computation of timestamps within parakeet_runner through a --timestamps flag (none|token|word|segment|all). Followed reference implementation from NVIDIA-NeMo/NeMo, which is cited as the way to run parakeet and compute timestamps for nvidia/parakeet-tdt-0.6b-v3 on HF.

Requires meta-pytorch/tokenizers#163 for the id_to_piece method on Tokenizers.

Test plan

Outputs the exact same transcription/timestamps as NVIDIA-NeMo/NeMo for audio files basketball.wav and audio.wav

NeMo (basketball.wav)

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-0.6b-v3")

output = asr_model.transcribe(['basketball.wav'], timestamps=True)

# by default, timestamps are enabled for char, word and segment level
word_timestamps = output[0].timestamp['word'] # word level timestamps for first sample
segment_timestamps = output[0].timestamp['segment'] # segment level timestamps
char_timestamps = output[0].timestamp['char'] # char level timestamps

for stamp in segment_timestamps:
    print(f"{stamp['start']}s - {stamp['end']}s : {stamp['segment']}")

for stamp in word_timestamps:
    print(f"{stamp['start']}s - {stamp['end']}s : {stamp['word']}")

for stamp in char_timestamps:
    print(f"{stamp['start']}s - {stamp['end']}s : {stamp['char']}")

Click to see output

Transcribing: 1it [00:01,  1.82s/it]
0.56s - 2.88s : All right, ballerina for the Nuggets and Mavs tonight.
2.96s - 3.6s : Late fourth quarter.
3.68s - 4.88s : Joker to MPJ.
4.96s - 5.68s : Back to you.
5.92s - 7.92s : Joker spins, fakes, scores.
8.08s - 10.24s : Was tied at 118 with two minutes to go.
10.32s - 11.52s : Then the Mavs by two.
11.6s - 12.48s : Joker drives.
12.56s - 14.64s : He'll miss but clean up his own mess.
14.8s - 16.080000000000002s : Tied at 120.
16.240000000000002s - 17.44s : Then time running out.
17.68s - 19.44s : Jamal to MPJ.
20.0s - 22.080000000000002s : He drives, stops, and pops.
22.16s - 23.04s : He had 17.
23.12s - 25.44s : The Nuggets had the lead with 6.5 to go.
25.6s - 26.8s : Mavs had a last chance.
26.88s - 31.28s : Kyrie Irving scored 43 points, but didn't hit the game winner.
31.44s - 32.24s : Joker got it.
32.32s - 34.56s : The Nuggets win at 122-120.
34.800000000000004s - 36.480000000000004s : Incredible stat line for Jokic.
36.72s - 37.36s : One of a kind.
37.6s - 40.88s : 37 points, 18 rebounds, 15 assists.
41.04s - 42.24s : He is simply the best.
42.4s - 45.92s : And the Nuggets have won four in a row and six of their last seven.
0.56s - 0.72s : All
0.72s - 0.8s : right,
0.96s - 1.76s : ballerina
1.76s - 1.84s : for
1.84s - 1.92s : the
1.92s - 2.24s : Nuggets
2.24s - 2.32s : and
2.32s - 2.64s : Mavs
2.64s - 2.88s : tonight.
2.96s - 3.2s : Late
3.2s - 3.44s : fourth
3.44s - 3.6s : quarter.
3.68s - 4.16s : Joker
4.16s - 4.32s : to
4.32s - 4.88s : MPJ.
4.96s - 5.2s : Back
5.2s - 5.44s : to
5.44s - 5.68s : you.
5.92s - 6.48s : Joker
6.48s - 6.88s : spins,
6.96s - 7.36s : fakes,
7.5200000000000005s - 7.92s : scores.
8.08s - 8.32s : Was
8.32s - 8.64s : tied
8.64s - 8.72s : at
8.72s - 9.36s : 118
9.36s - 9.52s : with
9.52s - 9.68s : two
9.68s - 9.92s : minutes
9.92s - 10.0s : to
10.0s - 10.24s : go.
10.32s - 10.56s : Then
10.56s - 10.64s : the
10.64s - 11.120000000000001s : Mavs
11.120000000000001s - 11.200000000000001s : by
11.28s - 11.52s : two.
11.6s - 12.16s : Joker
12.16s - 12.48s : drives.
12.56s - 12.96s : He'll
12.96s - 13.200000000000001s : miss
13.280000000000001s - 13.44s : but
13.44s - 13.68s : clean
13.84s - 13.92s : up
13.92s - 14.08s : his
14.08s - 14.32s : own
14.32s - 14.64s : mess.
14.8s - 15.200000000000001s : Tied
15.200000000000001s - 15.36s : at
15.36s - 16.080000000000002s : 120.
16.240000000000002s - 16.48s : Then
16.48s - 16.72s : time
16.88s - 17.28s : running
17.28s - 17.44s : out.
17.68s - 18.400000000000002s : Jamal
18.400000000000002s - 18.72s : to
18.72s - 19.44s : MPJ.
20.0s - 20.240000000000002s : He
20.240000000000002s - 20.72s : drives,
20.96s - 21.28s : stops,
21.36s - 21.6s : and
21.6s - 22.080000000000002s : pops.
22.16s - 22.32s : He
22.400000000000002s - 22.64s : had
22.64s - 23.04s : 17.
23.12s - 23.28s : The
23.28s - 23.68s : Nuggets
23.68s - 23.84s : had
23.84s - 23.92s : the
23.92s - 24.240000000000002s : lead
24.240000000000002s - 24.400000000000002s : with
24.400000000000002s - 25.12s : 6.5
25.12s - 25.28s : to
25.28s - 25.44s : go.
25.6s - 26.16s : Mavs
26.16s - 26.32s : had
26.32s - 26.400000000000002s : a
26.400000000000002s - 26.560000000000002s : last
26.560000000000002s - 26.8s : chance.
26.88s - 27.36s : Kyrie
27.52s - 28.0s : Irving
28.16s - 28.560000000000002s : scored
28.560000000000002s - 29.2s : 43
29.2s - 29.6s : points,
29.76s - 30.080000000000002s : but
30.080000000000002s - 30.48s : didn't
30.48s - 30.64s : hit
30.64s - 30.72s : the
30.72s - 30.96s : game
30.96s - 31.28s : winner.
31.44s - 32.0s : Joker
32.0s - 32.160000000000004s : got
32.160000000000004s - 32.24s : it.
32.32s - 32.480000000000004s : The
32.480000000000004s - 32.88s : Nuggets
32.88s - 33.12s : win
33.12s - 33.28s : at
33.28s - 34.56s : 122-120.
34.800000000000004s - 35.36s : Incredible
35.36s - 35.6s : stat
35.6s - 35.84s : line
35.84s - 36.0s : for
36.0s - 36.480000000000004s : Jokic.
36.72s - 36.88s : One
36.88s - 37.04s : of
37.04s - 37.12s : a
37.12s - 37.36s : kind.
37.6s - 38.160000000000004s : 37
38.160000000000004s - 38.56s : points,
38.72s - 39.2s : 18
39.2s - 39.68s : rebounds,
39.84s - 40.32s : 15
40.32s - 40.88s : assists.
41.04s - 41.2s : He
41.2s - 41.44s : is
41.44s - 41.76s : simply
41.76s - 42.0s : the
42.0s - 42.24s : best.
42.4s - 42.56s : And
42.56s - 42.64s : the
42.64s - 43.12s : Nuggets
43.12s - 43.28s : have
43.28s - 43.52s : won
43.52s - 43.76s : four
43.76s - 43.92s : in
43.92s - 44.0s : a
44.0s - 44.24s : row
44.24s - 44.56s : and
44.56s - 44.96s : six
44.96s - 45.2s : of
45.2s - 45.36s : their
45.36s - 45.6s : last
45.6s - 45.92s : seven.
0.56s - 0.72s : ['All']
0.72s - 0.8s : ['right']
0.8s - 0.8s : [',']
0.96s - 1.12s : ['b']
1.12s - 1.28s : ['all']
1.28s - 1.36s : ['er']
1.6s - 1.76s : ['ina']
1.76s - 1.84s : ['for']
1.84s - 1.92s : ['the']
1.92s - 2.0s : ['N']
2.0s - 2.08s : ['ug']
2.08s - 2.16s : ['g']
2.16s - 2.24s : ['ets']
2.24s - 2.32s : ['and']
2.32s - 2.4s : ['M']
2.4s - 2.48s : ['av']
2.48s - 2.64s : ['s']
2.64s - 2.72s : ['ton']
2.72s - 2.88s : ['ight']
2.88s - 2.88s : ['.']
2.96s - 3.04s : ['L']
3.04s - 3.2s : ['ate']
3.2s - 3.2800000000000002s : ['fo']
3.2800000000000002s - 3.36s : ['urt']
3.36s - 3.44s : ['h']
3.44s - 3.52s : ['quar']
3.52s - 3.6s : ['ter']
3.6s - 3.6s : ['.']
3.68s - 3.84s : ['J']
3.84s - 4.0s : ['ok']
4.0s - 4.16s : ['er']
4.16s - 4.32s : ['to']
4.32s - 4.48s : ['M']
4.48s - 4.64s : ['P']
4.64s - 4.88s : ['J']
4.88s - 4.88s : ['.']
4.96s - 5.04s : ['B']
5.04s - 5.2s : ['ack']
5.2s - 5.44s : ['to']
5.44s - 5.68s : ['you']
5.68s - 5.68s : ['.']
5.92s - 6.16s : ['J']
6.16s - 6.32s : ['ok']
6.32s - 6.48s : ['er']
6.48s - 6.72s : ['sp']
6.72s - 6.88s : ['ins']
6.88s - 6.88s : [',']
6.96s - 7.12s : ['fak']
7.28s - 7.36s : ['es']
7.36s - 7.36s : [',']
7.5200000000000005s - 7.76s : ['sc']
7.76s - 7.92s : ['ores']
7.92s - 7.92s : ['.']
8.08s - 8.32s : ['Was']
8.32s - 8.64s : ['tied']
8.64s - 8.72s : ['at']
8.72s - 8.8s : ['']
8.8s - 8.88s : ['1']
8.96s - 9.040000000000001s : ['1']
9.120000000000001s - 9.36s : ['8']
9.36s - 9.52s : ['with']
9.52s - 9.68s : ['two']
9.68s - 9.76s : ['minut']
9.76s - 9.92s : ['es']
9.92s - 10.0s : ['to']
10.0s - 10.24s : ['go']
10.24s - 10.24s : ['.']
10.32s - 10.4s : ['T']
10.4s - 10.56s : ['hen']
10.56s - 10.64s : ['the']
10.64s - 10.8s : ['M']
10.8s - 10.96s : ['av']
10.96s - 11.120000000000001s : ['s']
11.120000000000001s - 11.200000000000001s : ['by']
11.28s - 11.52s : ['two']
11.52s - 11.52s : ['.']
11.6s - 11.76s : ['J']
11.76s - 12.0s : ['ok']
12.0s - 12.16s : ['er']
12.16s - 12.24s : ['d']
12.24s - 12.32s : ['ri']
12.32s - 12.48s : ['ves']
12.48s - 12.48s : ['.']
12.56s - 12.72s : ['He']
12.72s - 12.72s : ["'"]
12.8s - 12.96s : ['ll']
12.96s - 13.200000000000001s : ['miss']
13.280000000000001s - 13.44s : ['but']
13.44s - 13.52s : ['c']
13.52s - 13.6s : ['le']
13.6s - 13.68s : ['an']
13.84s - 13.92s : ['up']
13.92s - 14.08s : ['his']
14.08s - 14.16s : ['o']
14.16s - 14.32s : ['wn']
14.32s - 14.48s : ['m']
14.48s - 14.64s : ['ess']
14.64s - 14.64s : ['.']
14.8s - 14.96s : ['T']
14.96s - 15.200000000000001s : ['ied']
15.200000000000001s - 15.36s : ['at']
15.36s - 15.44s : ['']
15.44s - 15.6s : ['1']
15.68s - 15.92s : ['2']
15.92s - 16.080000000000002s : ['0']
16.080000000000002s - 16.080000000000002s : ['.']
16.240000000000002s - 16.32s : ['T']
16.32s - 16.48s : ['hen']
16.48s - 16.72s : ['time']
16.88s - 17.04s : ['run']
17.04s - 17.28s : ['ning']
17.28s - 17.44s : ['out']
17.44s - 17.44s : ['.']
17.68s - 17.84s : ['J']
17.84s - 18.080000000000002s : ['am']
18.080000000000002s - 18.400000000000002s : ['al']
18.400000000000002s - 18.72s : ['to']
18.72s - 18.88s : ['M']
18.88s - 19.04s : ['P']
19.2s - 19.44s : ['J']
19.44s - 19.44s : ['.']
20.0s - 20.240000000000002s : ['He']
20.240000000000002s - 20.400000000000002s : ['d']
20.400000000000002s - 20.64s : ['ri']
20.64s - 20.72s : ['ves']
20.72s - 20.72s : [',']
20.96s - 21.12s : ['stop']
21.12s - 21.28s : ['s']
21.28s - 21.28s : [',']
21.36s - 21.6s : ['and']
21.6s - 21.84s : ['po']
21.84s - 22.080000000000002s : ['ps']
22.080000000000002s - 22.080000000000002s : ['.']
22.16s - 22.32s : ['He']
22.400000000000002s - 22.64s : ['had']
22.64s - 22.72s : ['']
22.72s - 22.88s : ['1']
22.88s - 23.04s : ['7']
23.04s - 23.04s : ['.']
23.12s - 23.28s : ['The']
23.28s - 23.36s : ['N']
23.36s - 23.44s : ['ug']
23.44s - 23.6s : ['g']
23.6s - 23.68s : ['ets']
23.68s - 23.84s : ['had']
23.84s - 23.92s : ['the']
23.92s - 24.080000000000002s : ['le']
24.080000000000002s - 24.240000000000002s : ['ad']
24.240000000000002s - 24.400000000000002s : ['with']
24.400000000000002s - 24.48s : ['']
24.48s - 24.72s : ['6']
24.72s - 24.72s : ['.']
24.88s - 25.12s : ['5']
25.12s - 25.28s : ['to']
25.28s - 25.44s : ['go']
25.44s - 25.44s : ['.']
25.6s - 25.76s : ['M']
25.76s - 26.0s : ['av']
26.0s - 26.16s : ['s']
26.16s - 26.32s : ['had']
26.32s - 26.400000000000002s : ['a']
26.400000000000002s - 26.560000000000002s : ['last']
26.560000000000002s - 26.64s : ['ch']
26.64s - 26.8s : ['ance']
26.8s - 26.8s : ['.']
26.88s - 27.04s : ['K']
27.04s - 27.28s : ['y']
27.28s - 27.36s : ['rie']
27.52s - 27.76s : ['Ir']
27.76s - 27.92s : ['v']
27.92s - 28.0s : ['ing']
28.16s - 28.32s : ['sc']
28.32s - 28.400000000000002s : ['or']
28.400000000000002s - 28.560000000000002s : ['ed']
28.560000000000002s - 28.72s : ['']
28.72s - 28.88s : ['4']
28.88s - 29.2s : ['3']
29.2s - 29.28s : ['po']
29.28s - 29.44s : ['in']
29.44s - 29.6s : ['ts']
29.6s - 29.6s : [',']
29.76s - 30.080000000000002s : ['but']
30.080000000000002s - 30.240000000000002s : ['did']
30.240000000000002s - 30.32s : ['n']
30.32s - 30.32s : ["'"]
30.48s - 30.48s : ['t']
30.48s - 30.64s : ['hit']
30.64s - 30.72s : ['the']
30.72s - 30.8s : ['g']
30.8s - 30.96s : ['ame']
30.96s - 31.04s : ['w']
31.04s - 31.2s : ['inn']
31.2s - 31.28s : ['er']
31.28s - 31.28s : ['.']
31.44s - 31.68s : ['J']
31.68s - 31.84s : ['ok']
31.84s - 32.0s : ['er']
32.0s - 32.160000000000004s : ['got']
32.160000000000004s - 32.24s : ['it']
32.24s - 32.24s : ['.']
32.32s - 32.480000000000004s : ['The']
32.480000000000004s - 32.56s : ['N']
32.56s - 32.64s : ['ug']
32.64s - 32.8s : ['g']
32.8s - 32.88s : ['ets']
32.88s - 32.96s : ['w']
32.96s - 33.12s : ['in']
33.12s - 33.28s : ['at']
33.28s - 33.36s : ['']
33.36s - 33.52s : ['1']
33.52s - 33.76s : ['2']
33.76s - 33.92s : ['2']
33.92s - 33.92s : ['-']
34.0s - 34.08s : ['1']
34.24s - 34.4s : ['2']
34.4s - 34.56s : ['0']
34.56s - 34.56s : ['.']
34.800000000000004s - 34.88s : ['In']
34.88s - 34.96s : ['c']
34.96s - 35.04s : ['re']
35.04s - 35.12s : ['di']
35.12s - 35.36s : ['ble']
35.36s - 35.6s : ['stat']
35.6s - 35.68s : ['l']
35.68s - 35.84s : ['ine']
35.84s - 36.0s : ['for']
36.0s - 36.160000000000004s : ['J']
36.160000000000004s - 36.32s : ['ok']
36.32s - 36.480000000000004s : ['ic']
36.480000000000004s - 36.480000000000004s : ['.']
36.72s - 36.800000000000004s : ['O']
36.800000000000004s - 36.88s : ['ne']
36.88s - 37.04s : ['of']
37.04s - 37.12s : ['a']
37.12s - 37.36s : ['kind']
37.36s - 37.36s : ['.']
37.6s - 37.68s : ['']
37.68s - 37.84s : ['3']
37.92s - 38.160000000000004s : ['7']
38.160000000000004s - 38.32s : ['po']
38.32s - 38.4s : ['in']
38.4s - 38.56s : ['ts']
38.56s - 38.56s : [',']
38.72s - 38.800000000000004s : ['']
38.800000000000004s - 38.96s : ['1']
38.96s - 39.2s : ['8']
39.2s - 39.36s : ['re']
39.36s - 39.44s : ['bo']
39.44s - 39.6s : ['und']
39.6s - 39.68s : ['s']
39.68s - 39.68s : [',']
39.84s - 39.92s : ['']
39.92s - 40.08s : ['1']
40.08s - 40.32s : ['5']
40.32s - 40.480000000000004s : ['ass']
40.480000000000004s - 40.64s : ['ist']
40.64s - 40.88s : ['s']
40.88s - 40.88s : ['.']
41.04s - 41.2s : ['He']
41.2s - 41.44s : ['is']
41.44s - 41.6s : ['simpl']
41.6s - 41.76s : ['y']
41.76s - 42.0s : ['the']
42.0s - 42.24s : ['best']
42.24s - 42.24s : ['.']
42.4s - 42.56s : ['And']
42.56s - 42.64s : ['the']
42.64s - 42.72s : ['N']
42.72s - 42.88s : ['ug']
42.88s - 43.04s : ['g']
43.04s - 43.12s : ['ets']
43.12s - 43.28s : ['have']
43.28s - 43.36s : ['w']
43.36s - 43.52s : ['on']
43.52s - 43.6s : ['fo']
43.6s - 43.76s : ['ur']
43.76s - 43.92s : ['in']
43.92s - 44.0s : ['a']
44.0s - 44.08s : ['ro']
44.08s - 44.24s : ['w']
44.24s - 44.56s : ['and']
44.56s - 44.72s : ['si']
44.72s - 44.96s : ['x']
44.96s - 45.2s : ['of']
45.2s - 45.36s : ['their']
45.36s - 45.6s : ['last']
45.6s - 45.76s : ['se']
45.76s - 45.92s : ['ven']
45.92s - 45.92s : ['.']

`parakeet_runner` (basketball.wav)

./cmake-out/examples/models/parakeet/parakeet_runner --model_path examples/models/parakeet/parakeet_tdt_exports/parakeet_tdt.pte --tokenizer_path /Users/matt/Workspace/executorch/examples/models/parakeet/parakeet_tdt_exports/tokenizer.model --audio_path basketball.wav --timestamps all

Click to see output

-> % ./cmake-out/examples/models/parakeet/parakeet_runner --model_path examples/models/parakeet/parakeet_tdt_exports/parakeet_tdt.pte --tokenizer_path /Users/matt/Workspace/executorch/examples/models/parakeet/parakeet_tdt_exports/tokenizer.model --audio_path /Users/matt/Documents/parakeet_test_audio/basketball.wav --timestamps all
I tokenizers:regex.cpp:27] Registering override fallback regex
I 00:00:00.005332 executorch:main.cpp:717] Loading model from: examples/models/parakeet/parakeet_tdt_exports/parakeet_tdt.pte
I 00:00:00.005729 executorch:main.cpp:733] Loading audio from: /Users/matt/Documents/parakeet_test_audio/basketball.wav
I 00:00:00.008822 executorch:wav_loader.h:98] WAV header detected, getting raw audio data.
I 00:00:00.008832 executorch:wav_loader.h:105] RIFF Header: RIFF
I 00:00:00.008835 executorch:wav_loader.h:106] Chunk Size: 1476488
I 00:00:00.008836 executorch:wav_loader.h:113] WAVE Header: WAVE
I 00:00:00.008840 executorch:wav_loader.h:120] Format Header: fmt 
I 00:00:00.008841 executorch:wav_loader.h:121] Format Chunk Size: 16
I 00:00:00.008843 executorch:wav_loader.h:122] Audio Format: 1
I 00:00:00.008845 executorch:wav_loader.h:123] Number of Channels: 1
I 00:00:00.008846 executorch:wav_loader.h:124] Sample Rate: 16000
I 00:00:00.008848 executorch:wav_loader.h:125] Byte Rate: 32000
I 00:00:00.008849 executorch:wav_loader.h:126] Block Align: 2
I 00:00:00.008851 executorch:wav_loader.h:127] Bits per Sample: 16
I 00:00:00.008852 executorch:wav_loader.h:132] Subchunk2Size: 1476418
I 00:00:00.009391 executorch:wav_loader.h:226] Loaded 738209 audio samples from WAV file: /Users/matt/Documents/parakeet_test_audio/basketball.wav
I 00:00:00.009403 executorch:main.cpp:736] Loaded 738209 audio samples
I 00:00:00.009432 executorch:main.cpp:747] Running preprocessor...
I 00:00:00.033216 executorch:cpuinfo_utils.cpp:71] Reading file /sys/devices/soc0/image_version
I 00:00:00.033243 executorch:cpuinfo_utils.cpp:87] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.065268 executorch:main.cpp:772] Mel spectrogram shape: [1, 128, 4614], mel_len: 4613
I 00:00:00.065280 executorch:main.cpp:775] Running encoder...
I 00:01:19.425587 executorch:main.cpp:792] Encoder output shape: [1, 1024, 577], len=577
I 00:01:19.425620 executorch:main.cpp:833] Model metadata: vocab_size=8192, blank_id=8192, num_rnn_layers=2, pred_hidden=640, sample_rate=16000, window_stride=0.010000, encoder_subsampling_factor=8
I 00:01:19.425627 executorch:main.cpp:835] Running TDT greedy decode...
I 00:01:32.149805 executorch:main.cpp:845] Decoded 289 tokens
I 00:01:32.149823 executorch:main.cpp:848] Loading tokenizer from: /Users/matt/Workspace/executorch/examples/models/parakeet/parakeet_tdt_exports/tokenizer.model
E tokenizers:hf_tokenizer.cpp:82] Error parsing json file: [json.exception.parse_error.101] parse error at line 2, column 1: syntax error while parsing value - invalid literal; last read: '<U+000A><U+000E>'
E tokenizers:tiktoken.cpp:59] invalid tiktoken line: 
I 00:01:32.154983 executorch:llm_runner_helper.cpp:77] Loaded Sentencepiece tokenizer
Transcribed text: All right, ballerina for the Nuggets and Mavs tonight. Late fourth quarter. Joker to MPJ. Back to you. Joker spins, fakes, scores. Was tied at 118 with two minutes to go. Then the Mavs by two. Joker drives. He'll miss but clean up his own mess. Tied at 120. Then time running out. Jamal to MPJ. He drives, stops, and pops. He had 17. The Nuggets had the lead with 6.5 to go. Mavs had a last chance. Kyrie Irving scored 43 points, but didn't hit the game winner. Joker got it. The Nuggets win at 122-120. Incredible stat line for Jokic. One of a kind. 37 points, 18 rebounds, 15 assists. He is simply the best. And the Nuggets have won four in a row and six of their last seven.
I 00:01:32.155043 executorch:main.cpp:867] Computing timestamps...
I 00:01:32.155828 executorch:main.cpp:873] Derived supported_punctuation size=11

Segment timestamps:
0.56s - 2.88s : All right, ballerina for the Nuggets and Mavs tonight.
2.96s - 3.6s : Late fourth quarter.
3.68s - 4.88s : Joker to MPJ.
4.96s - 5.68s : Back to you.
5.92s - 7.92s : Joker spins, fakes, scores.
8.08s - 10.24s : Was tied at 118 with two minutes to go.
10.32s - 11.52s : Then the Mavs by two.
11.6s - 12.48s : Joker drives.
12.56s - 14.64s : He'll miss but clean up his own mess.
14.8s - 16.08s : Tied at 120.
16.24s - 17.44s : Then time running out.
17.68s - 19.44s : Jamal to MPJ.
20s - 22.08s : He drives, stops, and pops.
22.16s - 23.04s : He had 17.
23.12s - 25.44s : The Nuggets had the lead with 6.5 to go.
25.6s - 26.8s : Mavs had a last chance.
26.88s - 31.28s : Kyrie Irving scored 43 points, but didn't hit the game winner.
31.44s - 32.24s : Joker got it.
32.32s - 34.56s : The Nuggets win at 122-120.
34.8s - 36.48s : Incredible stat line for Jokic.
36.72s - 37.36s : One of a kind.
37.6s - 40.88s : 37 points, 18 rebounds, 15 assists.
41.04s - 42.24s : He is simply the best.
42.4s - 45.92s : And the Nuggets have won four in a row and six of their last seven.

Word timestamps:
0.56s - 0.72s : All
0.72s - 0.8s : right,
0.96s - 1.76s : ballerina
1.76s - 1.84s : for
1.84s - 1.92s : the
1.92s - 2.24s : Nuggets
2.24s - 2.32s : and
2.32s - 2.64s : Mavs
2.64s - 2.88s : tonight.
2.96s - 3.2s : Late
3.2s - 3.44s : fourth
3.44s - 3.6s : quarter.
3.68s - 4.16s : Joker
4.16s - 4.32s : to
4.32s - 4.88s : MPJ.
4.96s - 5.2s : Back
5.2s - 5.44s : to
5.44s - 5.68s : you.
5.92s - 6.48s : Joker
6.48s - 6.88s : spins,
6.96s - 7.36s : fakes,
7.52s - 7.92s : scores.
8.08s - 8.32s : Was
8.32s - 8.64s : tied
8.64s - 8.72s : at
8.72s - 9.36s : 118
9.36s - 9.52s : with
9.52s - 9.68s : two
9.68s - 9.92s : minutes
9.92s - 10s : to
10s - 10.24s : go.
10.32s - 10.56s : Then
10.56s - 10.64s : the
10.64s - 11.12s : Mavs
11.12s - 11.2s : by
11.28s - 11.52s : two.
11.6s - 12.16s : Joker
12.16s - 12.48s : drives.
12.56s - 12.96s : He'll
12.96s - 13.2s : miss
13.28s - 13.44s : but
13.44s - 13.68s : clean
13.84s - 13.92s : up
13.92s - 14.08s : his
14.08s - 14.32s : own
14.32s - 14.64s : mess.
14.8s - 15.2s : Tied
15.2s - 15.36s : at
15.36s - 16.08s : 120.
16.24s - 16.48s : Then
16.48s - 16.72s : time
16.88s - 17.28s : running
17.28s - 17.44s : out.
17.68s - 18.4s : Jamal
18.4s - 18.72s : to
18.72s - 19.44s : MPJ.
20s - 20.24s : He
20.24s - 20.72s : drives,
20.96s - 21.28s : stops,
21.36s - 21.6s : and
21.6s - 22.08s : pops.
22.16s - 22.32s : He
22.4s - 22.64s : had
22.64s - 23.04s : 17.
23.12s - 23.28s : The
23.28s - 23.68s : Nuggets
23.68s - 23.84s : had
23.84s - 23.92s : the
23.92s - 24.24s : lead
24.24s - 24.4s : with
24.4s - 25.12s : 6.5
25.12s - 25.28s : to
25.28s - 25.44s : go.
25.6s - 26.16s : Mavs
26.16s - 26.32s : had
26.32s - 26.4s : a
26.4s - 26.56s : last
26.56s - 26.8s : chance.
26.88s - 27.36s : Kyrie
27.52s - 28s : Irving
28.16s - 28.56s : scored
28.56s - 29.2s : 43
29.2s - 29.6s : points,
29.76s - 30.08s : but
30.08s - 30.48s : didn't
30.48s - 30.64s : hit
30.64s - 30.72s : the
30.72s - 30.96s : game
30.96s - 31.28s : winner.
31.44s - 32s : Joker
32s - 32.16s : got
32.16s - 32.24s : it.
32.32s - 32.48s : The
32.48s - 32.88s : Nuggets
32.88s - 33.12s : win
33.12s - 33.28s : at
33.28s - 34.56s : 122-120.
34.8s - 35.36s : Incredible
35.36s - 35.6s : stat
35.6s - 35.84s : line
35.84s - 36s : for
36s - 36.48s : Jokic.
36.72s - 36.88s : One
36.88s - 37.04s : of
37.04s - 37.12s : a
37.12s - 37.36s : kind.
37.6s - 38.16s : 37
38.16s - 38.56s : points,
38.72s - 39.2s : 18
39.2s - 39.68s : rebounds,
39.84s - 40.32s : 15
40.32s - 40.88s : assists.
41.04s - 41.2s : He
41.2s - 41.44s : is
41.44s - 41.76s : simply
41.76s - 42s : the
42s - 42.24s : best.
42.4s - 42.56s : And
42.56s - 42.64s : the
42.64s - 43.12s : Nuggets
43.12s - 43.28s : have
43.28s - 43.52s : won
43.52s - 43.76s : four
43.76s - 43.92s : in
43.92s - 44s : a
44s - 44.24s : row
44.24s - 44.56s : and
44.56s - 44.96s : six
44.96s - 45.2s : of
45.2s - 45.36s : their
45.36s - 45.6s : last
45.6s - 45.92s : seven.

Token timestamps:
0.56s - 0.72s : All
0.72s - 0.8s : right
0.8s - 0.8s : ,
0.96s - 1.12s : b
1.12s - 1.28s : all
1.28s - 1.36s : er
1.6s - 1.76s : ina
1.76s - 1.84s : for
1.84s - 1.92s : the
1.92s - 2s : N
2s - 2.08s : ug
2.08s - 2.16s : g
2.16s - 2.24s : ets
2.24s - 2.32s : and
2.32s - 2.4s : M
2.4s - 2.48s : av
2.48s - 2.64s : s
2.64s - 2.72s : ton
2.72s - 2.88s : ight
2.88s - 2.88s : .
2.96s - 3.04s : L
3.04s - 3.2s : ate
3.2s - 3.28s : fo
3.28s - 3.36s : urt
3.36s - 3.44s : h
3.44s - 3.52s : quar
3.52s - 3.6s : ter
3.6s - 3.6s : .
3.68s - 3.84s : J
3.84s - 4s : ok
4s - 4.16s : er
4.16s - 4.32s : to
4.32s - 4.48s : M
4.48s - 4.64s : P
4.64s - 4.88s : J
4.88s - 4.88s : .
4.96s - 5.04s : B
5.04s - 5.2s : ack
5.2s - 5.44s : to
5.44s - 5.68s : you
5.68s - 5.68s : .
5.92s - 6.16s : J
6.16s - 6.32s : ok
6.32s - 6.48s : er
6.48s - 6.72s : sp
6.72s - 6.88s : ins
6.88s - 6.88s : ,
6.96s - 7.12s : fak
7.28s - 7.36s : es
7.36s - 7.36s : ,
7.52s - 7.76s : sc
7.76s - 7.92s : ores
7.92s - 7.92s : .
8.08s - 8.32s : Was
8.32s - 8.64s : tied
8.64s - 8.72s : at
8.72s - 8.8s : 
8.8s - 8.88s : 1
8.96s - 9.04s : 1
9.12s - 9.36s : 8
9.36s - 9.52s : with
9.52s - 9.68s : two
9.68s - 9.76s : minut
9.76s - 9.92s : es
9.92s - 10s : to
10s - 10.24s : go
10.24s - 10.24s : .
10.32s - 10.4s : T
10.4s - 10.56s : hen
10.56s - 10.64s : the
10.64s - 10.8s : M
10.8s - 10.96s : av
10.96s - 11.12s : s
11.12s - 11.2s : by
11.28s - 11.52s : two
11.52s - 11.52s : .
11.6s - 11.76s : J
11.76s - 12s : ok
12s - 12.16s : er
12.16s - 12.24s : d
12.24s - 12.32s : ri
12.32s - 12.48s : ves
12.48s - 12.48s : .
12.56s - 12.72s : He
12.72s - 12.72s : '
12.8s - 12.96s : ll
12.96s - 13.2s : miss
13.28s - 13.44s : but
13.44s - 13.52s : c
13.52s - 13.6s : le
13.6s - 13.68s : an
13.84s - 13.92s : up
13.92s - 14.08s : his
14.08s - 14.16s : o
14.16s - 14.32s : wn
14.32s - 14.48s : m
14.48s - 14.64s : ess
14.64s - 14.64s : .
14.8s - 14.96s : T
14.96s - 15.2s : ied
15.2s - 15.36s : at
15.36s - 15.44s : 
15.44s - 15.6s : 1
15.68s - 15.92s : 2
15.92s - 16.08s : 0
16.08s - 16.08s : .
16.24s - 16.32s : T
16.32s - 16.48s : hen
16.48s - 16.72s : time
16.88s - 17.04s : run
17.04s - 17.28s : ning
17.28s - 17.44s : out
17.44s - 17.44s : .
17.68s - 17.84s : J
17.84s - 18.08s : am
18.08s - 18.4s : al
18.4s - 18.72s : to
18.72s - 18.88s : M
18.88s - 19.04s : P
19.2s - 19.44s : J
19.44s - 19.44s : .
20s - 20.24s : He
20.24s - 20.4s : d
20.4s - 20.64s : ri
20.64s - 20.72s : ves
20.72s - 20.72s : ,
20.96s - 21.12s : stop
21.12s - 21.28s : s
21.28s - 21.28s : ,
21.36s - 21.6s : and
21.6s - 21.84s : po
21.84s - 22.08s : ps
22.08s - 22.08s : .
22.16s - 22.32s : He
22.4s - 22.64s : had
22.64s - 22.72s : 
22.72s - 22.88s : 1
22.88s - 23.04s : 7
23.04s - 23.04s : .
23.12s - 23.28s : The
23.28s - 23.36s : N
23.36s - 23.44s : ug
23.44s - 23.6s : g
23.6s - 23.68s : ets
23.68s - 23.84s : had
23.84s - 23.92s : the
23.92s - 24.08s : le
24.08s - 24.24s : ad
24.24s - 24.4s : with
24.4s - 24.48s : 
24.48s - 24.72s : 6
24.72s - 24.72s : .
24.88s - 25.12s : 5
25.12s - 25.28s : to
25.28s - 25.44s : go
25.44s - 25.44s : .
25.6s - 25.76s : M
25.76s - 26s : av
26s - 26.16s : s
26.16s - 26.32s : had
26.32s - 26.4s : a
26.4s - 26.56s : last
26.56s - 26.64s : ch
26.64s - 26.8s : ance
26.8s - 26.8s : .
26.88s - 27.04s : K
27.04s - 27.28s : y
27.28s - 27.36s : rie
27.52s - 27.76s : Ir
27.76s - 27.92s : v
27.92s - 28s : ing
28.16s - 28.32s : sc
28.32s - 28.4s : or
28.4s - 28.56s : ed
28.56s - 28.72s : 
28.72s - 28.88s : 4
28.88s - 29.2s : 3
29.2s - 29.28s : po
29.28s - 29.44s : in
29.44s - 29.6s : ts
29.6s - 29.6s : ,
29.76s - 30.08s : but
30.08s - 30.24s : did
30.24s - 30.32s : n
30.32s - 30.32s : '
30.48s - 30.48s : t
30.48s - 30.64s : hit
30.64s - 30.72s : the
30.72s - 30.8s : g
30.8s - 30.96s : ame
30.96s - 31.04s : w
31.04s - 31.2s : inn
31.2s - 31.28s : er
31.28s - 31.28s : .
31.44s - 31.68s : J
31.68s - 31.84s : ok
31.84s - 32s : er
32s - 32.16s : got
32.16s - 32.24s : it
32.24s - 32.24s : .
32.32s - 32.48s : The
32.48s - 32.56s : N
32.56s - 32.64s : ug
32.64s - 32.8s : g
32.8s - 32.88s : ets
32.88s - 32.96s : w
32.96s - 33.12s : in
33.12s - 33.28s : at
33.28s - 33.36s : 
33.36s - 33.52s : 1
33.52s - 33.76s : 2
33.76s - 33.92s : 2
33.92s - 33.92s : -
34s - 34.08s : 1
34.24s - 34.4s : 2
34.4s - 34.56s : 0
34.56s - 34.56s : .
34.8s - 34.88s : In
34.88s - 34.96s : c
34.96s - 35.04s : re
35.04s - 35.12s : di
35.12s - 35.36s : ble
35.36s - 35.6s : stat
35.6s - 35.68s : l
35.68s - 35.84s : ine
35.84s - 36s : for
36s - 36.16s : J
36.16s - 36.32s : ok
36.32s - 36.48s : ic
36.48s - 36.48s : .
36.72s - 36.8s : O
36.8s - 36.88s : ne
36.88s - 37.04s : of
37.04s - 37.12s : a
37.12s - 37.36s : kind
37.36s - 37.36s : .
37.6s - 37.68s : 
37.68s - 37.84s : 3
37.92s - 38.16s : 7
38.16s - 38.32s : po
38.32s - 38.4s : in
38.4s - 38.56s : ts
38.56s - 38.56s : ,
38.72s - 38.8s : 
38.8s - 38.96s : 1
38.96s - 39.2s : 8
39.2s - 39.36s : re
39.36s - 39.44s : bo
39.44s - 39.6s : und
39.6s - 39.68s : s
39.68s - 39.68s : ,
39.84s - 39.92s : 
39.92s - 40.08s : 1
40.08s - 40.32s : 5
40.32s - 40.48s : ass
40.48s - 40.64s : ist
40.64s - 40.88s : s
40.88s - 40.88s : .
41.04s - 41.2s : He
41.2s - 41.44s : is
41.44s - 41.6s : simpl
41.6s - 41.76s : y
41.76s - 42s : the
42s - 42.24s : best
42.24s - 42.24s : .
42.4s - 42.56s : And
42.56s - 42.64s : the
42.64s - 42.72s : N
42.72s - 42.88s : ug
42.88s - 43.04s : g
43.04s - 43.12s : ets
43.12s - 43.28s : have
43.28s - 43.36s : w
43.36s - 43.52s : on
43.52s - 43.6s : fo
43.6s - 43.76s : ur
43.76s - 43.92s : in
43.92s - 44s : a
44s - 44.08s : ro
44.08s - 44.24s : w
44.24s - 44.56s : and
44.56s - 44.72s : si
44.72s - 44.96s : x
44.96s - 45.2s : of
45.2s - 45.36s : their
45.36s - 45.6s : last
45.6s - 45.76s : se
45.76s - 45.92s : ven
45.92s - 45.92s : .

pytorch-bot · 2026-01-12T17:33:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16545

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 55 Pending

As of commit 7ada1eb with merge base dbf3c37 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-01-12T17:34:07Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

mergennachin · 2026-01-13T02:33:37Z

@mattjcly please rebase on top of #16323

mattjcly · 2026-01-13T15:04:01Z

examples/models/parakeet/main.cpp

  if (!num_rnn_layers_result.ok() || !pred_hidden_result.ok() ||
      !vocab_size_result.ok() || !blank_id_result.ok() ||
-      !sample_rate_result.ok()) {
+      !sample_rate_result.ok() || !window_stride_result.ok() ||


Note that this will break compat with previously exported parakeet models. I chose to do this b/c early in development and to avoid having to make a separate path that allows everything but timestamps if the new metadata isn't present.

Open to doing such a thing if reviewers feel strongly.

Yeah, i don't mind bc breaking at this early stage.

And it's in examples anyway not in core directory (like /extensions)

mergennachin

Thoughts on extracting out the abstraction for TimestampExtractor into its own file that could be used for other models in the future. The main runner is becoming pretty large now.

mergennachin · 2026-01-13T15:41:12Z

examples/models/parakeet/main.cpp

+
+    if (!word.empty() && (ends_with_delimiter || is_delimiter_word)) {
+      segment_words.push_back(word);
+      if (!segment_words.empty()) {


this seems redundant check right after push back

mattjcly · 2026-01-13T19:46:44Z

Thoughts on extracting out the abstraction for TimestampExtractor into its own file that could be used for other models in the future. The main runner is becoming pretty large now.

Made the following adjustments:

Added examples/models/parakeet/timestamp_utils.{h,cpp}: analogous to https://github.com/NVIDIA-NeMo/NeMo/blob/bf583c980b70cecc184fa8a083a9c3ddb87f905e/nemo/collections/asr/parts/utils/timestamp_utils.py
Added examples/models/parakeet/tokenizer_utils.{h,cpp}: analogous to https://github.com/NVIDIA-NeMo/NeMo/blob/bf583c980b70cecc184fa8a083a9c3ddb87f905e/nemo/collections/asr/parts/utils/tokenizer_utils.py
Added examples/models/parakeet/types.h for the shared types

I decided against a TimestampExtractor class/abstraction for the time being and instead went for "modules of free functions" because I didn't see anything to require statefulness and I didn't want to make an over-abstraction/over-generalization for something that is only used by parakeet at the moment.

I think this walks an appropriate middle ground for the current point in development - modularized code that can be easily extended if/when use with other models arrises, while avoiding too deep of an abstraction that may not work for other models. This is also why these utils are in namespaces parakeet::timestamp_utils and parakeet::tokenizer_utils for now, but can easily be moved to a more common place as-needed in future.

This is also nice because the NeMo ported code remains almost trivially mappable to the original NeMo files/methods.

@mergennachin does this seem reasonable to you? Open to discussion if you still feel there is a better approach.

mergennachin

See inline comment

mergennachin · 2026-01-13T20:10:11Z

examples/models/parakeet/README.md

 | `--audio_path` | Path to input audio file (.wav) |
 | `--tokenizer_path` | Path to tokenizer file (default: `tokenizer.json`) |
 | `--data_path` | Path to data file (.ptd) for delegate data (optional, required for CUDA) |
+| `--timestamps`     | Timestamp output mode: `none\|token\|word\|segment\|all` |


Can we default to one of the options?

Sure, defaulted to "segment" because the others can get pretty verbose in output

Can you say in README.md that segment is the default

mergennachin · 2026-01-13T20:37:56Z

examples/models/parakeet/tokenizer_utils.cpp

+
+namespace parakeet::tokenizer_utils {
+
+std::unordered_set<std::string> derive_supported_punctuation(


Thoughts on moving this to timestap_utils?

It's exclusively called once in main.cpp right before timestamp computation and passed to timestamp functions.

I personally think it fits a bit better in tokenizer_utils, since logically it only touches the tokenizer and does nothing timestamp related. Additionally it is in tokenizer_utils.py in NeMo reference implementation.

Contextually though, agree it is only needed in the parakeet_runner when we compute timestamps. Sortof a growing pain of some level of abstraction/modularization before needed elsewhere?

I personally would leave it, but can move it if you feel its important/a blocker. Don't feel strongly enough about it to blockingly oppose

mattjcly requested review from larryliu0820, lucylq and mergennachin as code owners January 12, 2026 17:33

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 12, 2026

mergennachin added the ciflow/trunk label Jan 12, 2026

pytorch-bot bot removed the ciflow/trunk label Jan 12, 2026

mergennachin requested a review from JacobSzwejbka January 13, 2026 02:33

mattjcly added 10 commits January 13, 2026 09:57

Update tokenizers for id_to_piece

d402b49

Working with supported_punctuation export (don't want)

0396189

Derive supported punctuation

fa3e404

Some cleanups

62774fa

Add DecodedToken

afc3427

Same token id types, some small type cleanups

6313a49

Refs and decode token string overload

df2e8e8

Rename to FrameAlignedToken

3a4a2f1

Helper for tokens with text info

1a23c14

try-catch get_tokens_with_text_info

0c9768d

mattjcly force-pushed the matt/parakeet-timestamps-new-tokenizer-method branch from 831f714 to 0c9768d Compare January 13, 2026 14:57

Remove duplicated mock

365896d

mattjcly commented Jan 13, 2026

View reviewed changes

mergennachin reviewed Jan 13, 2026

View reviewed changes

mattjcly added 2 commits January 13, 2026 14:30

timestamp_utils/tokenizer_utils/types re-organization

08b82fd

Remove redundant check

5c27d9d

mattjcly requested a review from kirklandsign as a code owner January 13, 2026 19:38

mattjcly added 2 commits January 13, 2026 15:00

CMake lint fix

9504e37

Explicit pytorch tokenizers include

349d0b6

mergennachin approved these changes Jan 13, 2026

View reviewed changes

mattjcly added 2 commits January 13, 2026 16:38

default to segement timestamps

4b5b15a

Default segment in readme

7ada1eb

mergennachin merged commit 9510334 into pytorch:main Jan 13, 2026
140 checks passed

mattjcly deleted the matt/parakeet-timestamps-new-tokenizer-method branch January 13, 2026 23:05


		namespace parakeet::tokenizer_utils {

		std::unordered_set<std::string> derive_supported_punctuation(

Timestamps in parakeet_runner #16545

Timestamps in parakeet_runner #16545

Uh oh!

Conversation

mattjcly commented Jan 12, 2026

Summary

Test plan

NeMo (basketball.wav)

parakeet_runner (basketball.wav)

Uh oh!

pytorch-bot bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16545

⏳ No Failures, 55 Pending

Uh oh!

github-actions bot commented Jan 12, 2026

This PR needs a release notes: label

Uh oh!

mergennachin commented Jan 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergennachin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattjcly commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergennachin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Timestamps in `parakeet_runner` #16545

Timestamps in `parakeet_runner` #16545

`parakeet_runner` (basketball.wav)

pytorch-bot bot commented Jan 12, 2026 •

edited

Loading

This PR needs a `release notes:` label

mattjcly commented Jan 13, 2026 •

edited

Loading