fix(tokenizers): discard citation from nominative reporter on overlap #237
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Solves #221 and #174
Uses a list of problematic nominative reporters to resolve overlpas
Due to the way we tokenize, an overlap was always resolved in favor of the first token. In the case of nominative reporters, this caused
a CitationToken to be found when a party name matched the
reporter's name, discarding the actual citation
This could be solved in a cleaner way by being consistent on tagging nominative reporters on reporters-db