Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment decisions #1

Closed
phette23 opened this issue Jun 13, 2023 · 3 comments
Closed

Deployment decisions #1

phette23 opened this issue Jun 13, 2023 · 3 comments
Assignees
Milestone

Comments

@phette23
Copy link
Member

phette23 commented Jun 13, 2023

Local or GCP. GCP comes with a number of further decisions.

Reuse our ES cluster or start a new one? Invenio will deprecate ES so we have to use OpenSearch, we cannot reuse our ES cluster.

See also #2 cloud storage research.

@phette23 phette23 added this to the MVP milestone Jun 13, 2023
@phette23
Copy link
Member Author

phette23 commented Jul 7, 2023

A budget for piloting and a budget for running in production. Use Google Compute calculator. Compute, egress, storage, database.

What do we need to know?

  • pilot storage space
  • lite storage space
  • full prod storage stage
  • can we estimate egress of current VAULT? Apache logs?
  • ask on the discord / CalTech, how much CPU etc. are people using?
  • are any AICAD schools running Invenio? is Northwestern too big to be comparable? Reach out to both Prism address at NW and Eli emailed AICAD list.

@phette23 phette23 self-assigned this Jul 13, 2023
@phette23
Copy link
Member Author

CalTech's readme notes that they use "a m6i.xlarge AWS EC2 instance with Ubuntu 20.04". From AWS' product details this is 4 vCPU and 16 GiB RAM. Based on their docs, they're running the main app, REST API, and celery worker as services on the same machine.

@phette23
Copy link
Member Author

phette23 commented Aug 2, 2023

Northwestern's Galter Health Sciences Library uses three nodes with these resources:

App: 2 vCPU, 8GB, 180GB HDD
DB: 2 vCPU, 4GB, 60GB HDD
OpenSearch: 2 vCPU, 6GB, 500 GB HDD

We would use GCP's Cloud SQL instead of the db node here. They're running all the services (nginx, UI app, REST API, celery worker, and redis cache) on the app node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant