-
Notifications
You must be signed in to change notification settings - Fork 13
Refine the doubling logic #418 #420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: MOODLE_403_STABLE
Are you sure you want to change the base?
Conversation
duyhuynhdev
commented
Dec 10, 2025
- Update the doubling logic according to the first approach, as the second approach still has corner cases where a single log event can dominate others if it has a large event count.
- Remove the timer callback and use the profiler flush callback instead. In addition, expose the new maxsample setting to support the flush callback.
- Update the relevant classes and unit tests to align with the new workflow and doubling logic.
- Add new unit tests to cover the updated behaviour.
- Fix CI complaints, including unsorted language files.
67e5801 to
70663e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @duyhuynhdev
This is a big change so we're going to need to test this thoroughly. I've started with comments from a code review, but I haven't tested this personally yet.
My main concerns so far is how this affects task detection and whether we are losing any sample data when we change the structure while combining components.
| // We want to prevent doubling up of processing, so skip if an existing process is still executing. | ||
| // The profile logs will be kept and processed the next time. | ||
| self::$logs[] = $log; | ||
| $this->logcount += $log->count(); | ||
| // Doubling sampling period if it reaches the limit. | ||
| if ($this->logcount >= $this->samplelimit) { | ||
| $this->on_reach_limit($manager); | ||
| $this->logcount = $this->logcount - $this->samplelimit; | ||
| } | ||
| if (self::$alreadyprofiling) { | ||
| debugging('tool_excimer: starting cron_processor::on_interval when previous call has not yet finished'); | ||
| if ($isfinal) { | ||
| // The final flush call when profiler is destroyed. | ||
| if ($log->count() < $this->samplelimit) { | ||
| // This should never happen. | ||
| debugging('tool_excimer: alreadyprofiling is true during final on_interval.'); | ||
| } | ||
| return; | ||
| } | ||
| self::$alreadyprofiling = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to prevent double up of processing may not be needed anymore now that we've switched from timers to flush callbacks (I'm not sure if this is paused during the callbacks). I think it's worth looking into further because it would be great if we could remove $self::logs entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this stage, the exact underlying scenario isn’t clear because the documentation doesn’t cover this case. However, this likely happens when maxSamples is set too low, causing the process to continue running while callbacks are triggered repeatedly. In this scenario, self::logs can be used for stored new coming samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously there were edge cases of the 10s interval overlapping, see #377
With your changes these should be much rarer as in the worst case the number of possible samples in on_interval should be very low. I'm not even sure if the profiler would be taking samples here.
I'm interested in what happens when max samples is set too low. The previous issue this fixed shouldn't be a concern if it only happens with bad config, but is there a chance of endless loops if this is set to something ridiculous like 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be possible, because we don’t have any constraint on the maxSamples value. That’s why I introduced the $logs array as a waiting list. New logs from the callback are collected in this array until the current process completes. I’m not sure how often overlaps will occur with the new logic, but $logs won’t impact the existing flow. On the contrary, it helps mitigate overlapping if it does occur.
|
|
||
| /** | ||
| * Merge excimer sample set | ||
| */ | ||
| private function merge_excimer_sample_set() { | ||
| $newsamples = []; | ||
| for ($i = 0; $i < count($this->samples); $i += 2) { | ||
| // Prevent sample limit is odd. | ||
| if ($i == count($this->samples) - 1) { | ||
| $newsamples[] = $this->samples[$i]; | ||
| } else { | ||
| if ($this->samples[$i]['eventcount'] >= $this->samples[$i + 1]['eventcount']) { | ||
| $trace = $this->samples[$i]['trace']; | ||
| } else { | ||
| $trace = $this->samples[$i + 1]['trace']; | ||
| } | ||
| $newsamples[] = [ | ||
| 'eventcount' => ($this->samples[$i]['eventcount'] + $this->samples[$i + 1]['eventcount']), | ||
| 'trace' => $trace, | ||
| ]; | ||
| } | ||
| } | ||
| $this->samples = $newsamples; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be useful to have comments explaining why we're doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I will add a description there
| $this->samples[] = [ | ||
| 'eventcount' => $eventcount, | ||
| 'trace' => $trace, | ||
| ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other information can $sample contain? We could be losing some data that is useful.
Ideally we keep these as a $sample after merging, but if that's not possible would need to update the docs of $samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only 2 type of samples:
- A task sample retrieved from ExcimerLogEntry
- A memory usage sample created from our code
For the first one , except the event count and trace , ExcimerLogEntry also has timestamp which is the floating point number of seconds since the ExcimerProfiler object was constructed. However, we will not use it anywhere so I won't collect it.
https://github.com/wikimedia/php-excimer/blob/master/stubs/ExcimerLogEntry.php
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, unfortunately the \ExcimerLogEntry class has private final function __construct() so we can't use that, so we are stuck with arrays or a new class. It's a bit messy, especially with memory using the same, but it works.
Might be worth linking the docs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the docs and explaination
| public function process($log, manager $manager, bool $isfinal = false) { | ||
| // We want to prevent overlapping of processing, so skip if an existing process is still executing. | ||
| // The profile logs will be kept and processed the next time. | ||
| self::$logs[] = $log; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments about overlapping processing in cron.
|
|
||
| // Doubling sampling period if it reaches the limit. | ||
| $this->logcount += $log->count(); | ||
| if ($this->partialsave && $this->logcount >= $this->samplelimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this applied to non-partial saves too? I believe it was previously applied there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only get sampling once with non non-partial saves which means we won't set the callback for non-partial saves ( as previous) so we the sampling period doubling cannot be applied this case. However, the merge is still applied when we process the samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that clarifies things and looks correct.
In that case, what happens if partial save is enabled and this is called by the flush callback when the ExcimerProfiler object is destroyed? Restarting it there seems bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$this->logcount is used to manage increases to the samplingperiod.Even in the final flush callback, the only affected element is the samplingperiod. So, it should not impact any profile, because the samplerate is no longer derived from the samplingperiod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm less concerned about the profile and more about the harm of restarting the profiler while it's being destroyed. It could be handled gracefully behind the scenes, but we should be able to detect this ourselves and explicitly handle the logic for not restarting it in this case.
bwalkerl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple more comments from testing. The transformations of the graphs after merging also doesn't look right to me (tested with partial saves to see before and after), so will need to test that more.
| $manager->get_timer()->setCallback(function () use ($manager) { | ||
| $manager->get_profiler()->setFlushCallback(function ($log) use ($manager) { | ||
| // Once overlapping has happened once, we prevent all future partial saving. | ||
| if (!$this->hasoverlapped) { | ||
| $this->process($manager, false); | ||
| $this->process($log, $manager); | ||
| } | ||
| }); | ||
| }, $this->maxsamples); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change partial save never marks the profile as finished.
I'm also thinking that with this patch it might be OK to re-enable partial saves by default (and lighten the old warnings) as the number of samples won't rise quickly when the DB is having issues, which was the main reason we disabled it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is a bug. Let me fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the update partial save is no longer saving partial profiles every time there is a flush callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update it in the next commit
classes/sample_set.php
Outdated
| $trace = $this->samples[$i + 1]['trace']; | ||
| } | ||
| $newsamples[] = [ | ||
| 'eventcount' => ($this->samples[$i]['eventcount'] + $this->samples[$i + 1]['eventcount']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the change in event counts the hover duration shown on the graph is incorrect after merging and needs to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been updated from summing to halving as discussed
| $string['field_month'] = 'Month'; | ||
| $string['field_name'] = 'Name'; | ||
| $string['field_numsamples'] = 'Number of samples'; | ||
| $string['field_numsamples_value'] = '{$a->samples} samples ({$a->events} events) @ ~{$a->samplerate}ms'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make this consistent with the hover display on the graph, which currently only uses 'samples' for events. We probably need to iron out the terminology since samples will have more meaning than events to most ends users.
I also think it would be better to keep the old display without events when the number of events is 0 (old profiles etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of events should not be 0. If there is no numevents it will use the numsamples instead.
'events' => number_format($data['numevents'] ?? $data['numsamples'], 0, $decsep, $thousandssep),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old profiles prior to the upgrade are showing as 0 in the database to me, but this isn't a big concern as old profiles will be flushed out eventually. That main issue here is making the terminology consistent with the graph.
70663e9 to
6137ab7
Compare
6137ab7 to
52cfb71
Compare
bwalkerl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @duyhuynhdev
I've left a couple more comments, plus responses to some of your other comments that have actions.
| * @package tool_excimer | ||
| */ | ||
| interface processor { | ||
| class processor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend making this an abstract class and keeping init as an abstract method
| $trace = $this->samples[$i + 1]['trace']; | ||
| } | ||
| $newsamples[] = [ | ||
| 'eventcount' => ceil(($this->samples[$i]['eventcount'] + $this->samples[$i + 1]['eventcount']) / 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ceiling preferred here? Always rounding up will increase the margin of error. The number of samples won't be perfect unless we allow decimals, but maybe we can introduce some logic here to keep it more balanced.
I also still think we need more comments about what we're doing here.
| * @throws \dml_exception | ||
| */ | ||
| public function process(manager $manager, bool $isfinal) { | ||
| public function on_interval($log, manager $manager) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, on_interval should be renamed (in both classes) as we're no longer using intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to on_flush