Skip to content

Add other_doi_by_id_job queue to queue workers autoscaling policy#271

Open
wendelfabianchinsamy wants to merge 1 commit intomasterfrom
add-other-doi-job-queue-to-queue-workers-auto-scaling-policy
Open

Add other_doi_by_id_job queue to queue workers autoscaling policy#271
wendelfabianchinsamy wants to merge 1 commit intomasterfrom
add-other-doi-job-queue-to-queue-workers-auto-scaling-policy

Conversation

@wendelfabianchinsamy
Copy link
Contributor

Purpose

closes: datacite/datacite#2453

Approach

Open Questions and Pre-Merge TODOs

Learning

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

  • New feature (non-breaking change which adds functionality)

  • Breaking change (fix or feature that would cause existing functionality to change)

Reviewer, please remember our guidelines:

  • Be humble in the language and feedback you give, ask don't tell.
  • Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
  • Offer suggestions on how to improve code e.g. simplification or expanding clarity.
  • Ensure you give reasons for the changes you are proposing.

@wendelfabianchinsamy
Copy link
Contributor Author

@richardhallett this is my initial implementation. I will monitor further to check if we can change thresholds. It seems that adding a new alarm is the route to attempt. I am open to suggestions or opinions on this current approach.

alarm_description = "Scale up when the new queue gets large"
alarm_actions = [aws_appautoscaling_policy.queue-worker_scale_up.arn]

threshold = "3000000"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're expecting 3 million messages as a baseline? that's a lot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 4 million messages in the queue at the moment. I've set the message lifetime to ten days since these operations are not time sensitive.

Copy link
Contributor

@richardhallett richardhallett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is I think it's going to conflict with the existing scaling.

Because lets say this triggers and scales up, the existing alarm which checks the counts of the other queues will trigger a scale down I'd imagine.

@wendelfabianchinsamy
Copy link
Contributor Author

The problem with this is I think it's going to conflict with the existing scaling.

Because lets say this triggers and scales up, the existing alarm which checks the counts of the other queues will trigger a scale down I'd imagine.

Yes I've been thinking about this as well. I wonder if we should consider setting the desired count for queue workers to two or maybe having a different deployment of queue workers for this process. The latter seems wasteful.

@richardhallett
Copy link
Contributor

If it's not time sensitive maybe we don't scale at all, just as you say increase the minimum to be 2 to handle the slightly larger load.

Another possibility is you use the existing scale_down metric to include the new queue, that way it'll scale up based on your custom rule, but only scale down if the total across all queues is less than X (5000 currently)

Without thinking more about bigger architecture changes I'd be reluctant to have a different deployment of queue-workers. To me the only way that makes sense if it's an isolated service (which is a possibility but more work)

@wendelfabianchinsamy
Copy link
Contributor Author

If it's not time sensitive maybe we don't scale at all, just as you say increase the minimum to be 2 to handle the slightly larger load.

Another possibility is you use the existing scale_down metric to include the new queue, that way it'll scale up based on your custom rule, but only scale down if the total across all queues is less than X (5000 currently)

Without thinking more about bigger architecture changes I'd be reluctant to have a different deployment of queue-workers. To me the only way that makes sense if it's an isolated service (which is a possibility but more work)

Yes let's go ahead with setting the desired count to two for now. We can continue monitoring and make a call at a later date. I will keep this pr open for future reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Creating queue worker autoscaling policy for new other doi job queues

2 participants