-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training with a subset of the training data #1093
Comments
Comment by sleepinyourhat I don't think there's an easy/obvious place to add this. If you need to use the tokenized text to determine distance, then it would make sense to add this to the training data iterator, but I can't guarantee that it'll be easy to identify the relevant code path for that. If you don't need to filter the tokenized, it might just make sense to create a filtered copy of the training data file, and use a new |
Comment by lovodkin93
So what you say is that I should filter sentences from the OntoNote that have coreferences that are too close to each other (if my goal is to work with coreferences bigger than a certain size)? |
Issue by lovodkin93
Friday May 22, 2020 at 11:06 GMT
Originally opened as nyu-mll/jiant#1093
Hello,
I would like to probe the bert module for the coreference task, but I would like the training to be done only on examples where the span distance is bigger than a certain size.
My question is - should I add this constraint only in the train loop in the train function in /jiant/trainer.py? Or is there another part of the code that I need to update?
Thanks!
The text was updated successfully, but these errors were encountered: