Activity function tasks not evenly retrieved & processed by all active & healthy host instances. #2991
-
Hi, need comment & advice on one odd scenario that I am currently encountering for my application. I am running an ADF function app using Microsoft.Azure.WebJobs.Extensions.DurableTask 2.9.0 (we are working on upgrading it) with 4 partitions in dedicated/app service plan with 15 host instances (plan type: Premium v3 P2V3). I noticed that there are always 4 host instances which are running in much higher CPU usage (e.g. 60% vs 10%) than the other 11 host instances. I am using ApplicationInsights and I have tried to investigate what each host instance are doing via using the following query:- Note: My application's functions has "Orchestrator" & "Activity" postfix in function name. From the chart above, I noticed that the 4 host instances which are handling orchestrator functions are processing activity function tasks way more than the other host instances and the difference is very big (13xxxx vs 6xxx). I have tried to search for similar discussions and the closest one I found is issue#804 but the issue is not answered yet. Any suggestions on why or how and where I should start digging for root cause? Do let me know if you need more information. Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi, I just wanted to check in and see if there’s any update on this Q&A. Let me know if there’s anything I can do to assist. Thanks again for your time and efforts! |
Beta Was this translation helpful? Give feedback.
-
In many situations, this can be expected. Orchestrations can only run on workers that own one of the configured partitions. When these orchestrations schedule activities, the worker that scheduled the activity will try to read it from the queue immediately without waiting for the polling interval to minimize queue latency. As a result, workers that own partitions are more likely to execute activities compared to workers that don’t own partitions, especially when load is not high. |
Beta Was this translation helpful? Give feedback.
In many situations, this can be expected. Orchestrations can only run on workers that own one of the configured partitions. When these orchestrations schedule activities, the worker that scheduled the activity will try to read it from the queue immediately without waiting for the polling interval to minimize queue latency. As a result, workers that own partitions are more likely to execute activities compared to workers that don’t own partitions, especially when load is not high.