-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about activity durations on Gantt charts #23
Comments
@junalmeida, Gantt charts are generated out of data that DurableClient.GetStatusAsync() returns as execution history. Items in that array have two DateTime fields - If I read the code and the data correctly, then yes, Yes, there're some other timestamps stored in the XXXHistory table, which we could potentially use to make Gantt charts more meaningful. Problem is that there're no documentation on those, so we can only make assumptions on their exact meaning. Would it be possible for you to provide a couple of examples of those lengthy activities, as they appear in your XXXHistory table? Some 'TaskScheduled' event (with all its timestamps) + some corresponding 'TaskCompleted' event (also with all its timestamps) - and how this activity appears on the 'Details' page and on the Gantt chart? |
Thanks for that info, @junalmeida . But you mentioned that "10 minute limit" - what is that? Is that the functionTimeout setting that you've set to 10 minutes? Or how exactly are you configuring that (there were in fact lots of older ways to configure function execution timeouts, and most of them do not work anymore)? |
This function lives on the basic serverless consumption plan, which implies on a 10 limit execution. Anything beyond that throws FunctionTimeoutException. That's why I don't think that 20 minute report is real, and to add, I also can't find any function reporting more than 10 min on application insights. I can try to find this exact instance for you. |
@junalmeida , the default timeout for Consumption plan is 5 minutes, not 10. This is why I was asking whether (and how) you're configuring a custom value for that timeout. 10 minutes is the maximum allowed value for that config setting. When your activity exceeds that timeout (either default or explicitly configured), it indeed is supposed to end up with a FunctionTimeoutException, like this: How many instances like this do you have? Is it just one instance or many? Are you sure this instance was actually run in Azure? Could it somehow happen that it was run on some devbox or any third-party machine (e.g. if your cloud environment and your devbox occasionally share the same Storage and the same TaskHub)? |
It is configured for 10 on its host.json file And I can see a handful of activities shown as ran by more than 10 minutes on Gantt chart. And no, I do not share Dev execution env on prod taskhub |
By far I'm unable to reproduce the behavior you described. Can you elaborate on what platform/language/Functions version you're using? |
Also, can you check that your instance is healthy by itself? Aka that there're no host crashes due to e.g. OutOfMemoryException and no any other weird effects? |
This project is written on C#, .NET Core 3.1. Az Functions v3.0.13, Durable Functions v2.5.1 Instance is healthy, I have no complaints and all jobs seem to be working. I can see no OOM on App Insights. Also checking reports on performance, the worst call I have on past 24 hours is 7 min which is valid within the 10 min limit. |
OK, after switching Microsoft.Azure.WebJobs.Extensions.DurableTask from Aka seems like some of those parallel activities are being queued and picked up for execution only after some other are finished. Even though this isn't being indicated by their Will try to play with it a little more. E.g. will try the latest version (since there's a slim hope that this weird behavior was fixed with this commit or any other commit). In the meanwhile, since it is definitely not a DfMon bug, can I ask you to raise an issue in azure-functions-durable-extension ? |
I can confirm that the behavior is the same with latest But I can also confirm, that this behavior only takes place, when the activity method is synchronous (aka if it holds a thread). If the method is asynchronous (e.g. marked with So I suggest that you check that your activity methods are implemented as asynchronous (returning Tasks), since it is a best practice anyway, especially for methods that can take that long to execute. |
I do not have any activity which is not async. All return |
But can it happen that they still hold a thread inside of them (by e.g. doing a Thread.Sleep() or something similar) ? |
Hm no, I have no code that holds a thread intentionally. And the function you see above is not the first one that I've found "running" for longer than 10 min on the Gantt chart. This is also not consistent as some instances are pretty fast, some are not, that's why I'm inclined to think this happens when the server is busy. This is another example, totally different function app, a different taskhub, also supposed to end within 10 min. |
OK, so your orchestration actually starts lots of those activities. This is most likely the reason. Because the default value for The solution is to set that maxConcurrentActivityFunctions setting to something more substantial. |
It looks like the duration is calculated using |
@scale-tone ATM we are not considering moving to premium plan because although we indeed fire lots of activities, there are not many orchestrations happening simultaneously, and we do not require a fast processing, so it is ok to wait a bit. I just feel like the Gantt chart could use more detailed reporting on "scheduled" vs "start time" as @bachuv just mentioned. We are also not considering increasing |
Btw I have a question for you: I've noticed that the total run time for each activity on the Gantt chart is a sum of "waiting time" with "execution" time. Is there a way for you to show those timings separately?
I say that because on the servers I have a 10 minute limit, and I can see some taking 20 min. So, I suppose this is the sum of waiting time (scheduled) with actual execution time.
Originally posted by @junalmeida in #22 (comment)
The text was updated successfully, but these errors were encountered: