-
Couldn't load subscription status.
- Fork 29
Description
Parent issue: CERNDocumentServer/cds-rdm#440
This issue is part of inveniosoftware/product-rdm#226 to add a GitLab integration. In order to reduce code duplication, the approach will be to adapt invenio-github and turn it into a "generic" module that supports any Version Control System (VCS) as long as it provides the necessary APIs and functionality. Implementations for specific VCSes like GitHub and GitLab will be provided in new contrib files.
Stage 1
Aiming to complete a fully functional, production-ready, well-documented MVP. We will regard it as complete when:
- it provides an end-user experience equivalent to the current GitHub integration, preferably as similar as possible.
- This has some scalability issues but for now we will avoid changing too much
- a clear migration script and guide are available and have been thoroughly tested with existing Zenodo data
- unit tests have been updated
- as many bugs fixed as possible
Work is split between several PRs to make reviewing easier. These are all being merged into the vcs-staging branch (so we can safely have unreleasable code kept separate from master). Once all the PRs of the first stage are merged, we can merge into master with a single squash commit. Before merging into master, a maint-v3.x branch should be created to continue maintaining the pure GitHub old version of this module. The first stage consists of:
- feat(vcs): rename
githubreferences tovcs#191 - feat(vcs): new data model #192
- feat(vcs): generic provider interface + contrib implementations #193
- feat(vcs): service layer #194
- feat(vcs): view handlers #195
- docs: start writing upgrade guide #196
- tests(vcs): compatibility for invenio-vcs #199
- compat for new VCS integration invenio-rdm-records#2128
- config: compat with new VCS integration invenio-app-rdm#3162
To see overall non-fragmented changes of invenio-vcs, please see my fork's master branch.
Todo for stage 1
- GitLab contrib. This is a priority as it's needed to test a lot of the other features (e.g. auth). It's very difficult to test e.g. OAuth without it.
- OAuth user ID correlation
- i.e. if the VCS provider uses the same OAuth server to authenticate the user as the Invenio instance, we should check the user IDs to make sure they match. This is useful for CDS-RDM where users will be able to link CERN GitLab, which uses the same CERN SSO.
- We could express this through a more versatile hook function that returns whether/not we should accept the authenticated user.
- Update: This can be done relatively easily by configuring a custom
info_serializerhandler ininvenio.cfg. See the example for CDS: WIP: User ID validator for GitLab CERNDocumentServer/cds-rdm#554
- Sync VCS repositories straight into the
vcs_repositoriestable instead of the OAuth remote userextra_datafield.- This will make querying a lot easier so we can paginate/search on the repository list page, which is currently very slow for users on e.g. GitLab instances where they have membership of thousands of repos due to group membership.
- Check duplication for organisational/team repos if multiple people activate them
- What happens if a user is deleted? How can we transfer the repos?
- Repo name should not be unique individually. It is unique as a tuple of (provider_id,provider,name)
- UI bug with menu not being able to differentiate between multiple dynamically-registered entries
- Unit tests
- Documentation
- Migration script and guide
- Careful testing of DB migration for existing GitHub repos/releases
- Some UI pages have not been adapted and continue to throw errors
- JSONB extra_data in oauthclient
- Correct handling of dependency in InvenioRDM
invenio-vcsis now an optional dependency, including in InvenioRDM. Whether the integration is enabled depends on whether the dependency is installed.- However, some higher level bindings in
invenio-app-rdmandinvenio-rdm-recordsperform overrides on classes ininvenio-vcswithout checking that it's installed. Which causes a crash if it isn't. - We need to find a neat way of avoiding this issue
- Check permissions
Stage 2
The following features will only be implemented in future PRs once Stage 1 has been fully completed and merged:
- Refresh token support
- In the existing GitHub impl we use access tokens which are non-expiring by default. This is a security issue in case of a database leak and is recommended against by RFC 6749.
- A PR exists (OAuth2 Token refresh implemented invenio-oauthclient#328) but needs some more work (last commit May 2024)
- Support for private repositories
- "Link-only" OAuth without an option to "sign in with" a remote
- React + API-based UI for pagination/search of repos, using OpenSearch
- See GitLab integration product-rdm#235 (reply in thread) for details\
- Notifications on successful archive GitLab integration product-rdm#226 (comment)
- Selecting community to directly publish the repo to (especially on community-mandatory instances): GitLab integration product-rdm#235 (comment)
- Propagate permissions so users who have access to a repo also have access to records created from releases
- Correct handling of orphaned repos
- Right now when a user disconnects their VCS accounts, we disable the hook on all repos they have access to. Obviously this is the wrong behaviour (e.g. what if there are other users still left connected that have access to the repo) so we need to implement something more logical.
- Copy changes from api: optimize sync process with batch task execution #197
- Allow customising the relation type on records created from repos on InvenioRDM.
- Currently, we add the repo as a Related Identifier with a relation type of
is supplement towhich seems a little unusual. We should consider whether this is a reasonable default to have, and probably allow for more easily customising it via a config variable.
- Currently, we add the repo as a Related Identifier with a relation type of
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
