Skip to content

Conversation

@nicolas-lelouche
Copy link

Part of solving issue #3

Copy link
Author

@nicolas-lelouche nicolas-lelouche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately chose version 0.2.4 since it's the version that's hardcoded here.

@rtbs-dev
Copy link
Collaborator

rtbs-dev commented Jul 22, 2021

I wonder if there's an endpoint for SciSpacy that's equivalent to latest or something? That way we could use the example here to automate the process as nothing but dvc "data".

Edit: I think I got it! scispacy could be imported in the respective download command, and we can use their VERSION variable here. Then sub that into the download url?

The problem is, you have to update the dvc remote url if it changes! So the logic is like this:

  • Current scispacy: scispacy.VERSION
  • use version to make current aws url for X_LANG_MODEL
  • Check if X_LANG_MODEL.dvc exists, and if so, whether it has an old URL
  • Only update the .dvc stub if the -r option was passed to cv-download (which will now wrap dvc update?)

@nicolas-lelouche
Copy link
Author

nicolas-lelouche commented Jul 22, 2021

at first glance, I think this works.
A potential use case is a new scispacy version is out and our user wants to update their models, should the command to download and to update the models be the same (which it would seem to be the case if dvc update was wrapped in cv-download -r <model>)?
Another issue is that models sometimes get added to scispacy. For example, the current cv-py version doesn't include the newest model en_core_sci_scibert that was added in version 0.4.0 (see release notes) and is thus not part of the hardcoded models in both the __compatible__ variable and the validation array at download.

Edit: in fact, I'd argue that this check is redundant if we use the __compatible__ variable as we're doing right now, a simple else should suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants