Adds parser task using deep biaffine parser#120
Draft
kylebgorman wants to merge 32 commits into
Draft
Conversation
One major issue is that this requires us to use negative indices for specials, which breaks assumptions in the indexes. Will have to come back and fix this.
Known issues: 1. I don't think the metrics test is going to work; I will need to shift all the head indices by special.OFFSET. 2. I am not passing a parser mask. Do I need to? I think maybe yes.
It has no effect in the model, so let's get rid of it.
* Adds logging for vocabularies Sample output: INFO: 22-Feb-26 17:56:27 - UPOS vocabulary (21): '[PAD]', '[UNK]', '_', 'ADJ', 'ADP', 'ADV', 'AUX', 'CCONJ', 'DET', 'INTJ', 'NOUN', 'NUM', 'PART', 'PRON', 'PROPN', 'PUNCT', 'SCONJ', 'SYM', 'VERB', 'X', '_' INFO: 22-Feb-26 17:56:27 - XPOS vocabulary (53): '[PAD]', '[UNK]', '_', '$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'GW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', '_', '``' INFO: 22-Feb-26 17:56:27 - Lemma vocabulary (533): [omitted] INFO: 22-Feb-26 17:56:27 - Features vocabulary (235): [omitted] Closes CUNY-CL#115. * black update * f-string fix * driveby: silence more warnings
See Yoyodyne [#369](CUNY-CL/yoyodyne#369) for context. Closes CUNY-CL#79.
* Fix pooling layer regression in UDTubeEncoder.forward Special cases pooling_layers=1 to use last_hidden_state directly, avoiding unnecessary allocation of all hidden states. This seems to save a lot of GPU memory. A few drive-bys: 1. suppress progress bar during test data generation 2. add "not human-readable" to "[omitted]" when logging lemmas 3. actually log features; why not? 4. pass information about which heads to build to the data module too, so it logs properly 5. removes _ from "special", since it doesn't require any special treatment in actuality; it's just another tag as far as we're concerned. 6. Standardizes trailing """: it's on its own line if the comment is more than one line. * regeneration last-minute fix
* Adds logging for vocabularies Sample output: INFO: 22-Feb-26 17:56:27 - UPOS vocabulary (21): '[PAD]', '[UNK]', '_', 'ADJ', 'ADP', 'ADV', 'AUX', 'CCONJ', 'DET', 'INTJ', 'NOUN', 'NUM', 'PART', 'PRON', 'PROPN', 'PUNCT', 'SCONJ', 'SYM', 'VERB', 'X', '_' INFO: 22-Feb-26 17:56:27 - XPOS vocabulary (53): '[PAD]', '[UNK]', '_', '$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'GW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', '_', '``' INFO: 22-Feb-26 17:56:27 - Lemma vocabulary (533): [omitted] INFO: 22-Feb-26 17:56:27 - Features vocabulary (235): [omitted] Closes CUNY-CL#115. * black update * f-string fix * driveby: silence more warnings
See Yoyodyne [#369](CUNY-CL/yoyodyne#369) for context. Closes CUNY-CL#79.
* Fix pooling layer regression in UDTubeEncoder.forward Special cases pooling_layers=1 to use last_hidden_state directly, avoiding unnecessary allocation of all hidden states. This seems to save a lot of GPU memory. A few drive-bys: 1. suppress progress bar during test data generation 2. add "not human-readable" to "[omitted]" when logging lemmas 3. actually log features; why not? 4. pass information about which heads to build to the data module too, so it logs properly 5. removes _ from "special", since it doesn't require any special treatment in actuality; it's just another tag as far as we're concerned. 6. Standardizes trailing """: it's on its own line if the comment is more than one line. * regeneration last-minute fix
One major issue is that this requires us to use negative indices for specials, which breaks assumptions in the indexes. Will have to come back and fix this.
Known issues: 1. I don't think the metrics test is going to work; I will need to shift all the head indices by special.OFFSET. 2. I am not passing a parser mask. Do I need to? I think maybe yes.
It has no effect in the model, so let's get rid of it.
* Fix pooling layer regression in UDTubeEncoder.forward Special cases pooling_layers=1 to use last_hidden_state directly, avoiding unnecessary allocation of all hidden states. This seems to save a lot of GPU memory. A few drive-bys: 1. suppress progress bar during test data generation 2. add "not human-readable" to "[omitted]" when logging lemmas 3. actually log features; why not? 4. pass information about which heads to build to the data module too, so it logs properly 5. removes _ from "special", since it doesn't require any special treatment in actuality; it's just another tag as far as we're concerned. 6. Standardizes trailing """: it's on its own line if the comment is more than one line. * regeneration last-minute fix
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft.
Closes #72.