Activity function tasks not evenly retrieved & processed by all active & healthy host instances. #2991

yeefung-hin · 2024-12-06T11:02:00Z

yeefung-hin
Dec 6, 2024

Hi, need comment & advice on one odd scenario that I am currently encountering for my application.

I am running an ADF function app using Microsoft.Azure.WebJobs.Extensions.DurableTask 2.9.0 (we are working on upgrading it) with 4 partitions in dedicated/app service plan with 15 host instances (plan type: Premium v3 P2V3). I noticed that there are always 4 host instances which are running in much higher CPU usage (e.g. 60% vs 10%) than the other 11 host instances.

I am using ApplicationInsights and I have tried to investigate what each host instance are doing via using the following query:-
requests | extend requestType = case( name contains "Orchestrator", "Orchestrator", name contains "Activity", "Activity", "HTTP" ) | extend hostInstanceId = tostring(customDimensions.HostInstanceId) | where cloud_RoleName == @CloudRoleName and timestamp between (@StartDatetime .. @ToDatetime) | summarize count() by hostInstanceId, requestType

Note: My application's functions has "Orchestrator" & "Activity" postfix in function name.

From the chart above, I noticed that the 4 host instances which are handling orchestrator functions are processing activity function tasks way more than the other host instances and the difference is very big (13xxxx vs 6xxx).

I have tried to search for similar discussions and the closest one I found is issue#804 but the issue is not answered yet. Any suggestions on why or how and where I should start digging for root cause?

Do let me know if you need more information. Thanks.

Answered by cgillum

Dec 18, 2024

In many situations, this can be expected. Orchestrations can only run on workers that own one of the configured partitions. When these orchestrations schedule activities, the worker that scheduled the activity will try to read it from the queue immediately without waiting for the polling interval to minimize queue latency. As a result, workers that own partitions are more likely to execute activities compared to workers that don’t own partitions, especially when load is not high.

View full answer

yeefung-hin · 2024-12-18T03:16:22Z

yeefung-hin
Dec 18, 2024
Author

Hi, I just wanted to check in and see if there’s any update on this Q&A. Let me know if there’s anything I can do to assist. Thanks again for your time and efforts!

0 replies

cgillum · 2024-12-18T16:14:59Z

cgillum
Dec 18, 2024
Maintainer

In many situations, this can be expected. Orchestrations can only run on workers that own one of the configured partitions. When these orchestrations schedule activities, the worker that scheduled the activity will try to read it from the queue immediately without waiting for the polling interval to minimize queue latency. As a result, workers that own partitions are more likely to execute activities compared to workers that don’t own partitions, especially when load is not high.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activity function tasks not evenly retrieved & processed by all active & healthy host instances. #2991

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Activity function tasks not evenly retrieved & processed by all active & healthy host instances. #2991

yeefung-hin Dec 6, 2024

Replies: 2 comments

yeefung-hin Dec 18, 2024 Author

cgillum Dec 18, 2024 Maintainer

yeefung-hin
Dec 6, 2024

yeefung-hin
Dec 18, 2024
Author

cgillum
Dec 18, 2024
Maintainer