-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving performance of Algorithm Backtrack #2
Comments
If you need that to be fast, instead of having
Dont sweat, you have at least 6x or so to optimize on that routine :) |
@redknightlois , thanks a lot for this :), I don't think I quite understand everything you talked about but I'll try to figure out as much as I can. I'll come back with some questions for you probably soon :). Thanks in advance. |
Which part you need more detail? |
Ill let you know soon once I have some time behind my code again :)). |
Baseline for generating a Maze is about 3.81 seconds with the lowest being 3.72. Currently back in action again :). I played around with your advise a bit:
I changed:
to:
This actually resulted in Mazes being generated a littlebit slower (generation time 3.92 seconds)
What exactly do you mean by this? Every time I add a target I need to increase that targetCount to put the next one in the next slot. How would you propose this? (Or am I missing what you mean 😄)
You'd have to check if you're on an edge in every step still right? Now I'll only check if I'm on the left edge if I want to add the left target. If I would do all this beforehand I'd just have to store the results in a bool and then still do an ifcheck if (leftEdge || rightEdge || ....) before I can add the other targets. Or would you have a better idea?
Do you mean that I bitshift the x and y in a long? Or make a Union out of it? In the MazePointPos I've indeed set the struct layout as Sequential. But not in the default MazePoint. The reason for MazePointPos to be sequential is to save memory when using it in a big array.
I'll see if I can do that. Would you have to do something special to get AVG working? |
I did some more playing around and tried to store the x - 2 that happens 3 times in a variable. This however didn't improve the times but rather slowed them down by about 5%. Before:
After:
Anyone has any idea why this doesn't improve generation times? |
Because you are using 4 extra register, probably the JIT is doing a poorer allocation of registers and end up paying 2 times for the stack store and retrieve. Post both assembler listings and I can tell you for sure. |
@redknightlois , wow that's some fast response time ^^. Thanks. If you do have some time, just above here I also posted some additional questions to your first reply. Let me see if I can get the assembler code. |
You can get away without the If you now use a byte array, you can check for 0 in an unsafe way just reading it as a |
Try changing: targets[targetCount].X = xLeft;
targets[targetCount].Y = y; into ref t = targets[targetCount];
t.X = xLeft
t.Y = y There shouldnt be any difference, but I am not entirely sure if the JIT of 2.0 is able to reuse the EDIT: You can also do the same with the map statements... twice per get if you need to keep register use down (because you hit the threshold). |
This is doing exactly this: public readonly struct SomeObjectCallback : IAction
{
private readonly SomeObject _myObjectRef;
public SomeObjectCallback(SomeObject objectRef)
{
this._myObjectRef = objectRef;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Invoke(int step, int total, long x, long y)
{
this._myuObjectRef.SomeMethodCallback(step, total, x, y);
}
} |
I'll wait with my further response untill I've checked out all your advise, I did however create a SharpLab entry if you would want to check out the assembler code emitted: Again, your help is really helpfull, I'd love to learn more about this :) |
About the With usage of L0174: lea rax, [r15+0x10]
L0178: mov [rax], ecx
L017a: mov [rax+0x4], r13d
L017e: mov dword [rsp+0x54], 0x1 without L01dc: movsxd r8, eax
L01df: mov [r15+r8*8+0x10], ecx
L01e4: movsxd rcx, eax
L01e7: mov [r15+rcx*8+0x14], r13d
L01ec: inc eax
L01ee: mov [rsp+0x54], eax See the difference? Pretty difficult to try to get all that information from SharpLab mainly because there is no marker on what part of the code is on each assembler line... hunting for the proper line is hard. This uses far less physical resources which if you saw the talk Scratched Metal is key to high-performance tight loops. Yet again, that would depend on the targetted JIT and you have biggest offenders, but you can help older JITs with the ref... |
Thanks for the elaborate explanation.
The reason I need the
If I don't know which directions are valid ones, I also can't randomly choose a valid direction. The only thing I can think of is predetermining all valid directions with 4 bools and then simply coding everything out. E.g.:
However this would become a humoungous loop because you'd have to do it for all random scenario's and for all different scenario's where Left and Top are valid for example, but Bottom and Right are not. Because then the
I'll definitely look into implementing this. Map statements will probably be harder because the map is implemented using a BitArray.
Thanks for the advise. In my case I don't really need the performance when an action is provided though. But if I do for some other scenario I'd definitely implement it like this. Thanks a lot for the help. |
This is currently the fastest I managed to get: I did try to rewrite the whole algorithm to do less if checks: |
This is a quite clever use of signaling interfaces, good job there ;) |
Ah I thought switch statements would be better. If I would change this to if checks, would that be better? I could either write out all cases in an else if or write them with some bit wise checks. E.g. doing a a bitwise check if Left is valid, and if so, then check if also up is valid, etc.etc. Which one would you suggest? |
First remove the exceptions, creating exceptions (even if they are not in a try-catch) is a good way to disrupt the code layout, unless you hit one of the optimizations by the JIT for cold code. You will have to look at the assembler generated to know for sure if the code is being moved away. Mhhh, definitely not switch, but I would reword the whole process instead... the amount of repeated code you are writing (even if cleverly encapsulated on the function) is giving you a hint that the flow is potentially at fault here. Do you have an example I can run with a known (deterministic) output that I can check in order to avoid making mistakes, I want to try a few things myself which are probably too convoluted to explain from zero. |
@redknightlois , I'll try to see what removing the exceptions does for performance, thanks for the tip. In regards to trying yourself, you can simply clone the application and run the included ConsoleApplication. This piece of code will be executed with a deterministic seed:
|
@redknightlois , Removing the Exceptions didn't really seem to do much. However rewriting everything as an if else seemed to improve the speed somewhat: |
Still the problem there is a too long method with lots of register spilling. |
Yeah I understand. V2 had the shorter method but uses more checks instead so that's what's causing slowness there. |
I'd like to track progress being made to improve the performance of the Algorithm backtrack.
I've created a new class that implements generics + structs to remove logging statements at compile/JIT to improve performance but I'm definitely looking for more.
Class with work in progress:
https://github.com/devedse/DeveMazeGeneratorCore/blob/master/src/DeveMazeGenerator/Generators/AlgorithmBacktrack2.cs
The text was updated successfully, but these errors were encountered: