-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Enforce version limit in task queues #8788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
service/matching/config.go
Outdated
|
|
||
| loadCause loadCause | ||
| loadCause loadCause | ||
| MaxVersionsInTaskQueue func() int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you keep these settings consistent in the four places they have to appear? like put this right underneath MaxTaskQueuesInDeployment for all of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not all the versioning settings appear in per-tq config. but I moved this one around to make things more consistent.
|
|
||
| // Remove this deleted version | ||
| delete(deploymentData.Versions, dv.buildID) | ||
| deletedVersions = deletedVersions[1:] // Remove from slice to keep count accurate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more idiomatic way will be to use a dedicated variable to know the total number of elements that were deleted.
Example:
cleaned := false
deletedRemaining := len(deletedVersions)
for _, dv := range deletedVersions {
totalCount := undeletedCount + deletedRemaining
if !dv.updateTime.Before(aWeekAgo) && totalCount <= maxVersions {
break
}
delete(deploymentData.Versions, dv.buildID)
deletedRemaining--
cleaned = true
}
Shivs11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some minor comments/concerns but not blocking!
| MatchingMaxVersionsInTaskQueue = NewNamespaceIntSetting( | ||
| "matching.maxVersionsInTaskQueue", | ||
| 200, | ||
| `MatchingMaxVersionsInTaskQueue represents the maximum number of versions that can be registered in a single task queue. | ||
| Should be larger than MatchingMaxVersionsInDeployment because a task queue can be in versions spanning across more than one deployments.`, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious: was 200 just chosen arbitrarily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's twice MatchingMaxVersionsInDeployment so in case someone moves to another deployment, they'd get "maximum versions in deployment exceeded" before getting "maximum versions in task queue exceeded" which is harder for user to deal with because there is no direct way to see all the versions in a TQ.
| }, nil | ||
| } | ||
|
|
||
| // CleanupOldDeletedVersions removes versions deleted more than 7 days ago. Also removes more deleted versions if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: we should also mention the fact that older, stale versions are only deleted from this task queue iff there is some caller calling SyncDeploymentUserData for this specific task queue and is not a recurring/background GC process.
This makes me wonder that in the worst case, if no one were to call this Sync method, there could be a world where a task queue just has a bunch of versions and would still keep on serving tasks to pollers. This could happen if someone were to not make a new deployment and someone had made a bunch of deployments in the past 7 days (I could see this coming up in our internal dogfooding environments)
Can we/Should we make this an automatic process? How much additional work would that be? If not, maybe we should document this fact here on this function since if there is a bottleneck tomorrow, we know where to look at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if no one were to call this Sync method, there could be a world where a task queue just has a bunch of versions and would still keep on serving tasks to pollers.
the tasks will be served if the version is not deleted only. and if it's not deleted, then event automatic CG would not touch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, I get that - my concern was about to this being a bottleneck when it comes to delivery of tasks if noone has called the Sync call in like a week or something and the task queue has like 199 versions (Theoretically possible, but low probability)
| } | ||
| } | ||
|
|
||
| func (s *Versioning3Suite) skipFromVersion(version workerdeployment.DeploymentWorkflowVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this guy seems to not be used anywhere
| for i := 0; i < maxVersions; i++ { | ||
| tvVersion := tv.WithDeploymentSeriesNumber(i).WithBuildIDNumber(i) | ||
| upsertVersions := make(map[string]*deploymentspb.WorkerDeploymentVersionData) | ||
| upsertVersions[tvVersion.BuildID()] = &deploymentspb.WorkerDeploymentVersionData{ | ||
| Status: enumspb.WORKER_DEPLOYMENT_VERSION_STATUS_INACTIVE, | ||
| } | ||
|
|
||
| deploymentName := tvVersion.DeploymentVersion().GetDeploymentName() | ||
| _, err := s.GetTestCluster().MatchingClient().SyncDeploymentUserData( | ||
| ctx, &matchingservice.SyncDeploymentUserDataRequest{ | ||
| NamespaceId: s.NamespaceID().String(), | ||
| TaskQueue: tv.TaskQueue().GetName(), | ||
| TaskQueueTypes: []enumspb.TaskQueueType{tqTypeWf}, | ||
| DeploymentName: deploymentName, | ||
| UpsertVersionsData: upsertVersions, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rather than doing this, you could also just have 5 different pollers, each coming from different deployments, poll on the task queue right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I felt this way is more deterministic and less chance of flakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure i understand why using pollers would not make it deterministic though - you just start 5 pollers, wait for those to appear in the task queue and you then proceed with the test.
ca3e198 to
ac34047
Compare
ac34047 to
2d9f305
Compare
What changed?
Enforce a max limit on number of versions registered in a task queue.
Why?
Safety agains user data becoming too big.
How did you test it?
Potential risks
User's polls will be rejected if server cannot allow new version registration due to the limit.