Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart span matching #26

Open
RichJackson opened this issue Jun 10, 2024 · 0 comments
Open

Smart span matching #26

RichJackson opened this issue Jun 10, 2024 · 0 comments

Comments

@RichJackson
Copy link
Collaborator

Sometimes the smart span matching matches a really long string, and all/many available substrings. This then uses a lot of resources for mostly bad matches.

E.g. the list of genes in the 'figure 1' caption in this article:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589645/

Can we switch to using beam search on the predicted BIO-labels, or some other approach, especially since most of these available sub-spans are getting picked up by the ExplosionNERStep when actually relevant?

also:

as per conversation with @wonjininfo , we need to make some changes to use the new multi label tinyber2 classifier:

    def get_softmax_predictions(self, loader: DataLoader) -> Tensor: 
           """ get a namedtuple_values_indices consisting of confidence and labels for a given dataloader (i.e. run bert) 
                :param loader: 
               :return: """ 
               results = torch.cat( [ x.logits for x in self.trainer.predict( model=self.model, dataloaders=loader, return_predictions=True )] 
               ) # return logits here # softmax = self.softmax(results) 
               # get confidence scores and label ints # confidence_and_labels_tensor = torch.max(softmax, dim=-1) 
                 return results

set smartspan threshold to 0.0 (negative logits are negative results)

fix smart span processor to detect spans properly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant