Adds parser task using deep biaffine parser by kylebgorman · Pull Request #120 · CUNY-CL/udtube

kylebgorman · 2026-04-08T16:15:50Z

Draft.

Closes #72.

One major issue is that this requires us to use negative indices for specials, which breaks assumptions in the indexes. Will have to come back and fix this.

Known issues: 1. I don't think the metrics test is going to work; I will need to shift all the head indices by special.OFFSET. 2. I am not passing a parser mask. Do I need to? I think maybe yes.

It has no effect in the model, so let's get rid of it.

* Adds logging for vocabularies Sample output: INFO: 22-Feb-26 17:56:27 - UPOS vocabulary (21): '[PAD]', '[UNK]', '_', 'ADJ', 'ADP', 'ADV', 'AUX', 'CCONJ', 'DET', 'INTJ', 'NOUN', 'NUM', 'PART', 'PRON', 'PROPN', 'PUNCT', 'SCONJ', 'SYM', 'VERB', 'X', '_' INFO: 22-Feb-26 17:56:27 - XPOS vocabulary (53): '[PAD]', '[UNK]', '_', '$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'GW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', '_', '``' INFO: 22-Feb-26 17:56:27 - Lemma vocabulary (533): [omitted] INFO: 22-Feb-26 17:56:27 - Features vocabulary (235): [omitted] Closes CUNY-CL#115. * black update * f-string fix * driveby: silence more warnings

See Yoyodyne [#369](CUNY-CL/yoyodyne#369) for context. Closes CUNY-CL#79.

* Fix pooling layer regression in UDTubeEncoder.forward Special cases pooling_layers=1 to use last_hidden_state directly, avoiding unnecessary allocation of all hidden states. This seems to save a lot of GPU memory. A few drive-bys: 1. suppress progress bar during test data generation 2. add "not human-readable" to "[omitted]" when logging lemmas 3. actually log features; why not? 4. pass information about which heads to build to the data module too, so it logs properly 5. removes _ from "special", since it doesn't require any special treatment in actuality; it's just another tag as far as we're concerned. 6. Standardizes trailing """: it's on its own line if the comment is more than one line. * regeneration last-minute fix

* Adds logging for vocabularies Sample output: INFO: 22-Feb-26 17:56:27 - UPOS vocabulary (21): '[PAD]', '[UNK]', '_', 'ADJ', 'ADP', 'ADV', 'AUX', 'CCONJ', 'DET', 'INTJ', 'NOUN', 'NUM', 'PART', 'PRON', 'PROPN', 'PUNCT', 'SCONJ', 'SYM', 'VERB', 'X', '_' INFO: 22-Feb-26 17:56:27 - XPOS vocabulary (53): '[PAD]', '[UNK]', '_', '$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'GW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', '_', '``' INFO: 22-Feb-26 17:56:27 - Lemma vocabulary (533): [omitted] INFO: 22-Feb-26 17:56:27 - Features vocabulary (235): [omitted] Closes CUNY-CL#115. * black update * f-string fix * driveby: silence more warnings

See Yoyodyne [#369](CUNY-CL/yoyodyne#369) for context. Closes CUNY-CL#79.

* Fix pooling layer regression in UDTubeEncoder.forward Special cases pooling_layers=1 to use last_hidden_state directly, avoiding unnecessary allocation of all hidden states. This seems to save a lot of GPU memory. A few drive-bys: 1. suppress progress bar during test data generation 2. add "not human-readable" to "[omitted]" when logging lemmas 3. actually log features; why not? 4. pass information about which heads to build to the data module too, so it logs properly 5. removes _ from "special", since it doesn't require any special treatment in actuality; it's just another tag as far as we're concerned. 6. Standardizes trailing """: it's on its own line if the comment is more than one line. * regeneration last-minute fix

One major issue is that this requires us to use negative indices for specials, which breaks assumptions in the indexes. Will have to come back and fix this.

Known issues: 1. I don't think the metrics test is going to work; I will need to shift all the head indices by special.OFFSET. 2. I am not passing a parser mask. Do I need to? I think maybe yes.

It has no effect in the model, so let's get rid of it.

* Fix pooling layer regression in UDTubeEncoder.forward Special cases pooling_layers=1 to use last_hidden_state directly, avoiding unnecessary allocation of all hidden states. This seems to save a lot of GPU memory. A few drive-bys: 1. suppress progress bar during test data generation 2. add "not human-readable" to "[omitted]" when logging lemmas 3. actually log features; why not? 4. pass information about which heads to build to the data module too, so it logs properly 5. removes _ from "special", since it doesn't require any special treatment in actuality; it's just another tag as far as we're concerned. 6. Standardizes trailing """: it's on its own line if the comment is more than one line. * regeneration last-minute fix

kylebgorman added 30 commits January 14, 2026 22:57

Adds metrics for parsing

d53531b

Beginning integration

749043b

Adds metrics test.

04b0b23

One major issue is that this requires us to use negative indices for specials, which breaks assumptions in the indexes. Will have to come back and fix this.

Draft of parser and its integration

28e1cdd

More work.

8b0d96c

Known issues: 1. I don't think the metrics test is going to work; I will need to shift all the head indices by special.OFFSET. 2. I am not passing a parser mask. Do I need to? I think maybe yes.

Applies shift to metrics test to avoid collisions.

f8defcb

Moves reverse_edits to data, where it belongs.

a499d51

It has no effect in the model, so let's get rid of it.

Days' debugging work

c506566

More work; still debugging

386bed6

Optimizes mmap instructions (CUNY-CL#116)

5057d25

Updates Black version

e80df85

Avoids "Crashed" status in sweeps. (CUNY-CL#118)

011cff4

See Yoyodyne [#369](CUNY-CL/yoyodyne#369) for context. Closes CUNY-CL#79.

Update special.py

9053124

fix typo

b75ba43

Optimizes mmap instructions (CUNY-CL#116)

7712ab4

Avoids "Crashed" status in sweeps. (CUNY-CL#118)

b5f2fd2

See Yoyodyne [#369](CUNY-CL/yoyodyne#369) for context. Closes CUNY-CL#79.

Beginning integration

f42e721

Adds metrics test.

bedb192

One major issue is that this requires us to use negative indices for specials, which breaks assumptions in the indexes. Will have to come back and fix this.

Draft of parser and its integration

3093858

More work.

63f290a

Known issues: 1. I don't think the metrics test is going to work; I will need to shift all the head indices by special.OFFSET. 2. I am not passing a parser mask. Do I need to? I think maybe yes.

Moves reverse_edits to data, where it belongs.

64ff892

It has no effect in the model, so let's get rid of it.

Days' debugging work

a133a31

More work; still debugging

f58654d

Optimizes mmap instructions (CUNY-CL#116)

b962e48

Manual merge

df55a73

kylebgorman added 2 commits April 7, 2026 17:25

README and bibliography

e2b916e

manual merge of upstream/master

b892d9e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds parser task using deep biaffine parser#120

Adds parser task using deep biaffine parser#120
kylebgorman wants to merge 32 commits into
CUNY-CL:masterfrom
kylebgorman:parser2

kylebgorman commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kylebgorman commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant