Add concurrency test for async Versioning operations #8798

ShahabT · 2025-12-11T07:44:59Z

What changed?

Add concurrency test and refine some rough edges.

Why?

testing is good!

How did you test it?

Potential risks

None

Shivs11 · 2025-12-12T16:35:50Z

service/worker/workerdeployment/version_workflow.go

-	// First ensure deployment workflow is running
-	if !d.VersionState.StartedDeploymentWorkflow {
-		activityCtx := workflow.WithActivityOptions(ctx, defaultActivityOptions)
-		err := workflow.ExecuteActivity(activityCtx, d.a.StartWorkerDeploymentWorkflow, &deploymentspb.StartWorkerDeploymentRequest{
-			DeploymentName: d.VersionState.Version.DeploymentName,
-			RequestId:      d.newUUID(ctx),
-		}).Get(ctx, nil)
-		if err != nil {
-			return err


I wonder: Could there ever be a case where a deployment workflow CAN's while the version workflow is processing an update?

In theory, this could happen. Imagine this scenario:

User sends in a setCurrent update.

Deployment workflow commences an async operation of the sync propagation to the task queues by calling the version workflow.

Version workflow, when it comes here, sends a signal.

The deployment workflow CAN's since according to itself, the state has changed.

I am not fully familiar about the signal "sending" semantics when a workflow is CAN'ing, but this could be important to consider so pasting this here.

yes it's possible, but CaN is atomic which means there is no gap between closing the old execution and creating the new one. both pieces happen at the same time.

so the signal will go to the old or the new execution but it can't see "not found".

Even if that was not the case, this check does not protect against the mentioned scenario because the CaN can happen just after this check and before the signal.

Shivs11 · 2025-12-12T16:40:31Z

service/worker/workerdeployment/version_workflow.go

+			d.logger.Error("Update canceled before worker deployment workflow started")
+			return serviceerror.NewDeadlineExceeded("Update canceled before worker deployment workflow started")


nit: Can improve error message by saying something like:

Update canceled since the corresponding worker deployment workflow for this version did not start.

Shivs11 · 2025-12-12T17:25:05Z

service/worker/workerdeployment/workflow.go

-		// workflowVersion is set at workflow start based on the dynamic config of the worker
-		// that completes the first task. It remains constant for the lifetime of the run and
-		// only updates when the workflow performs continue-as-new.
+		// Tracks the version of the deployment workflow when a particular run of a workflow starts base on the dynamic config of the


run of a workflow starts based on the dynamic config of the
worker that completes the first task of the workflow. workflowVersion remains the same until the workflow CaNs, after which it will get another chance to pick the latest manager version.

this part was undone unintentionally, will revert to the old comment which was your suggestion.

Shivs11 · 2025-12-12T17:27:44Z

service/worker/workerdeployment/workflow.go

+		if !asyncMode {
+			// Erase summary drainage status immediately, so it is not draining/drained.
+			d.setDrainageStatus(newRampingVersion, enumspb.VERSION_DRAINAGE_STATUS_UNSPECIFIED, routingUpdateTime)
+		}


slightly confused; I think we did have this functionality of changing the version's drainage status almost instantly if we realize it's being promoted. Did we delete it by mistake?

No need for this because syncVersion takes care of updating summary already.

Shivs11 · 2025-12-12T17:31:35Z

tests/worker_deployment_test.go

+	//t.Run("sync", func(t *testing.T) {
+	//	suite.Run(t, &WorkerDeploymentSuite{workflowVersion: workerdeployment.InitialVersion})
+	//})
+	//t.Run("async", func(t *testing.T) {
+	//	suite.Run(t, &WorkerDeploymentSuite{workflowVersion: workerdeployment.AsyncSetCurrentAndRamping})
+	//})


reminder: remove the comments here

Add concurrency test for async Versioning operations

a227cc7

ShahabT requested review from a team as code owners December 11, 2025 07:45

Shivs11 reviewed Dec 12, 2025

View reviewed changes

Fix tests and some refactoring

4337879

ShahabT force-pushed the shahab/concurrency branch from bd56986 to 4337879 Compare December 13, 2025 04:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add concurrency test for async Versioning operations #8798

Add concurrency test for async Versioning operations #8798

ShahabT commented Dec 11, 2025

Uh oh!

Shivs11 Dec 12, 2025

Uh oh!

ShahabT Dec 13, 2025

Uh oh!

Shivs11 Dec 12, 2025

Uh oh!

Shivs11 Dec 12, 2025

Uh oh!

ShahabT Dec 13, 2025

Uh oh!

Shivs11 Dec 12, 2025

Uh oh!

ShahabT Dec 13, 2025

Uh oh!

Shivs11 Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		d.logger.Error("Update canceled before worker deployment workflow started")
		return serviceerror.NewDeadlineExceeded("Update canceled before worker deployment workflow started")

Add concurrency test for async Versioning operations #8798

Are you sure you want to change the base?

Add concurrency test for async Versioning operations #8798

Conversation

ShahabT commented Dec 11, 2025

What changed?

Why?

How did you test it?

Potential risks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants