Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Element data optimizations #3287

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

tederis
Copy link
Member

@tederis tederis commented Jan 10, 2024

This PR is intended to improve the element data performance. The current implementation is far too inefficient because of a lot of string copying and the scourge of strlen invocations. There is also an excessive amount of access to std::map which makes it worse. Prior to the PR each setElementData call was producing at least 5 string copies on client and at least 7 copies(and a multiple of player count in worst case when using an element data subscription) on server side. There were also 14 strlen invocations(each has O(n) complexity).

Among the other improvements an additional ElementData's value check(newValue != oldValue) in method Packet_CustomData which can protect a server from PACKET_ID_CUSTOM_DATA packets spamming and the same for CElementRPCs::SetElementData method which also can protect a client from erroneous changes.

Simple(preliminary) measurements(local set/getElementData) show the performance gain nearly 2(in some cases 5, depends on an element data value) times.

@tederis tederis added enhancement New feature or request refactor labels Jan 10, 2024
@tederis
Copy link
Member Author

tederis commented Jan 10, 2024

Fun fact: 80% of the setElementData execution time is on(Client)ElementDataChange event processing with just one simple event handler.

@TriangleToo
Copy link

TriangleToo commented Jan 10, 2024

Fun fact: 80% of the setElementData execution time is on(Client)ElementDataChange event processing with just one simple event handler.

lol. I would understand it for an event that can be canceled, but not so much for an event that is not cancellable (like this one). Is there any way to optimize this? Just wondering..

@@ -279,12 +279,10 @@ void CClientEntity::SetID(ElementID ID)
}
}

CLuaArgument* CClientEntity::GetCustomData(const char* szName, bool bInheritData, bool* pbIsSynced)
CLuaArgument* CClientEntity::GetCustomData(const SString& strName, bool bInheritData, bool* pbIsSynced)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are truing to not use SString in new code. Discussion

Copy link
Member Author

@tederis tederis Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know. But in this partucular casу SString was chosen for the reason that it was already used in almost all associative containers. By using lets say std::string you would get an additional string copy (see SString(const std::string& strText) constructor) and potential problems with key mathching. So the usage of std::string would exceed the boundaries of this PR.

Copy link
Contributor

@Pirulax Pirulax Jan 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use std::string_view all the way, and only in the end (when inserting) create the SString?
This way for the getter you can get away without creating a string at all.
Also, since the code limits the string length to 128, you could use some kind of a fixed-length string and store the data directly inside it (So Caching works better).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string_view uses 8 bytes.
Pointers and (I believe) references use 4 bytes.
So function use less registers. And the code can be optimized better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not allocating any memory until absolutely necessary far outweighs using 2 registers instead of 1.......

Copy link
Member Author

@tederis tederis Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you so against string_view I don't understand.

Im not totally against string_view. If you can pass simpler value, you should use it. Basic string compare function doesn't require string length. It's similar to removing unused data from function arguments.

std::hash struct requires string len. That is std::unordered_map will have to construct a new string from the characters buffer(const char*) which is obviously not the best behaviour. std::string_view does not have this drawback.

If you use std::map you can use take advantage of transparent lookup. Sadly unordered_map doesn't support this. Another way of doing this would be to use a fixed length string buffer as the key to avoid heap allocation (and by using an unodered_map we could increase cache locality)

Heterogeneous lookups for unordered maps should be available since C++20. Regretfully we're still with C++17.

As you must be aware we are using both (unordered hash and tree-based maps) in MTA. Thus the solution must be selected according to the specifics of the both. Fixed length strings cannot be efficient here due to the unpredictable length of such strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you must be aware we are using both (unordered hash and tree-based maps) in MTA. Thus the solution must be selected according to the specifics of the both. Fixed length strings cannot be efficient here due to the unpredictable length of such strings.

See MAX_CUSTOMDATA_NAME_LENGTH.
So no name can be longer than 128 chars, so a 128 + 1 byte buffer would be enough.
A cache line is typically 64B, so keep that in mind. (But I think 128 + 1 for the char buffer, and uint8 for the length should be enough, this way the whole struct would be 132 bytes (I think, or 136 on 64b).
I'm unsure why you think fixed length strings would be inefficient. Tree based maps have extreme memory overhead for small values (std::string + CLuaArgument is barely a few bytes).
[Also, memory footprint can be lowered by merging my CLuaArgument refactor that uses std::variant (Thus the final size is quite a bit less)]

Heterogeneous lookups for unordered maps should be available since C++20. Regretfully we're still with C++17

So we can upgrade to c++20 (For the server/client dm project at least).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we could experiment with custom hash functions (that generate less collisions for our use-case), if we decide to use a hashmap.

Copy link
Member Author

@tederis tederis Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See MAX_CUSTOMDATA_NAME_LENGTH.
So no name can be longer than 128 chars, so a 128 + 1 byte buffer would be enough.
A cache line is typically 64B, so keep that in mind. (But I think 128 + 1 for the char buffer, and uint8 for the length should be enough, this way the whole struct would be 132 bytes (I think, or 136 on 64b).
I'm unsure why you think fixed length strings would be inefficient. Tree based maps have extreme memory overhead for small values (std::string + CLuaArgument is barely a few bytes).
[Also, memory footprint can be lowered by merging my CLuaArgument refactor that uses std::variant (Thus the final size is quite a bit less)]

128 bytes is still a lot actually. On large servers such keys can eat up to 10 megabytes(or even more) of the ram. You might say that 10 mb does not deserve attention, but on client side it may be significant (especially on high-load custom maps).

Personally I would prefer the StringName(as like in Godot Engine, or FName in UE) approach that implies just one buffer for each unique string. This way we only need a unique number that can be used as a key within the maps.

Copy link
Contributor

@Pirulax Pirulax Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an awesome idea, never thought of that, you're a genius!
So we could both save RAM, and CPU!
Just have to have a string interning system (That is global to all elements).
We also need to use some kind of reference counting (shared_ptr perhaps? Though the control block is allocated separately, that's kinda a waste of memory in our case) - To free unused strings.

Not sure how good of a key the serial index of the string would be, in terms of load factor (we want to avoid list traversals of course).

Not sure how efficient the same interning would be for CLuaArgument.

Though, I do have a concern, we currently use case-sensitive strings, right? FName's are case-insensitive (which makes sense, less variation)

@tederis
Copy link
Member Author

tederis commented Jan 11, 2024

Fun fact: 80% of the setElementData execution time is on(Client)ElementDataChange event processing with just one simple event handler.

lol. I would understand it for an event that can be canceled, but not so much for an event that is not cancellable (like this one). Is there any way to optimize this? Just wondering..

Yes, events can be slightly optimized. But to get a noticeable performance gain there would need a lot of work(probably, a full reengineering of the event system).

@tederis
Copy link
Member Author

tederis commented Jan 11, 2024

I think function setElementDataSilence(element, key, silent)(suggest the better name) can be added for blocking element data events. Because in most of the cases those events are not needed but they still consume a lot of CPU.

@tederis
Copy link
Member Author

tederis commented Jan 12, 2024

Apart from Lua events (see notes about their performance above), the element data is as fast as possible now.

@tederis
Copy link
Member Author

tederis commented Jan 12, 2024

In theory google::dense_hash_map may give even better results but I'm afraid it will consume too much memory

@Allerek
Copy link
Contributor

Allerek commented Jan 12, 2024

I think function setElementDataSilence(element, key, silent)(suggest the better name) can be added for blocking element data events. Because in most of the cases those events are not needed but they still consume a lot of CPU.

better to just add argument to setElementData that tells if it should trigger an event

@Allerek
Copy link
Contributor

Allerek commented Jan 12, 2024

In theory google::dense_hash_map may give even better results but I'm afraid it will consume too much memory

These days we have a lot of memory in our computers. 16GB is the minimum for new pcs in few last years, 8GB is still common. I think people can handle that really. Lets not be so paranoic. Main problem of element-datas are their CPU usage.

@tederis
Copy link
Member Author

tederis commented Jan 12, 2024

I think function setElementDataSilence(element, key, silent)(suggest the better name) can be added for blocking element data events. Because in most of the cases those events are not needed but they still consume a lot of CPU.

better to just add argument to setElementData that tells if it should trigger an event

This brings up a question: what should we do when a client sends data to a server? Send this flag as well and dictate whether server should invoke an event? It can produce the following circle: client -> server -> other clients. One of the clients may decide to silently change the data without even possibility for detecting it at endpoints. Or we shouldn't allow clients do that?

@ds1-e
Copy link
Contributor

ds1-e commented Jan 12, 2024

These days we have a lot of memory in our computers. 16GB is the minimum for new pcs in few last years, 8GB is still common. I think people can handle that really. Lets not be so paranoic. Main problem of element-datas are their CPU usage.

But you are aware that MTA/GTA is only 32 bit?

@TriangleToo
Copy link

I think function setElementDataSilence(element, key, silent)(suggest the better name) can be added for blocking element data events. Because in most of the cases those events are not needed but they still consume a lot of CPU.

better to just add argument to setElementData that tells if it should trigger an event

This brings up a question: what should we do when a client sends data to a server? Send this flag as well and dictate whether server should invoke an event? It can produce the following circle: client -> server -> other clients. One of the clients may decide to silently change the data without even possibility for detecting it at endpoints. Or we shouldn't allow clients do that?

IMO this should definitely be an additional argument to setElementData instead of another function. It makes no sense to have it as a separate function (at least none that I can think of). The client should have this argument ignored when syncMode is set to true (synced with server) and same on the server logic - it should ignore such flag if sent from client.

One of the clients may decide to silently change the data without even possibility for detecting it at endpoints

This can be also solved by #3286 😎

@Pirulax
Copy link
Contributor

Pirulax commented Jan 14, 2024

newValue != oldValue doesn't work all that well.
Checking whenever 2 LuaArguments are the same isn't trivial, and the implementation is currently very shallow.
So keep that in mind.

@tederis
Copy link
Member Author

tederis commented Jan 31, 2024

Although this PR is fully functional I'm forced to admit that most of the new suggestions cannot be addressed without the C++20 support (see the discussion in comments above). A brief attempt to enable C++20 shows that a lot of things(even the singular deathmatch module) depend on previous standards. So there is many things that should be resolved before. I hope that work will be done in the foreseeable future.

@Einheit-101
Copy link

Am i correct that we can save a lot of performance if we just delete all "onClientElementDataChange" event handlers in all running scripts? I use this event a lot in my old WW2 server because its very handy but the performance is terrible, even on a Ryzen 5900X.

@tederis
Copy link
Member Author

tederis commented Feb 1, 2024

Am i correct that we can save a lot of performance if we just delete all "onClientElementDataChange" event handlers in all running scripts? I use this event a lot in my old WW2 server because its very handy but the performance is terrible, even on a Ryzen 5900X.

Events are the heavier part of the element data. Thus, we can get a significant performance gain by disabling them. My original idea was to add setElementDataSilence(element, key, silent) function for this. If you're asking if you can get that gain right now, then yes. The less event handles you have the less they impact the performance.

@Dutchman101
Copy link
Member

Are there any updates or foreseeable plans to continue this? Element data optimizations would be huge for MTA.
@tederis

@tederis
Copy link
Member Author

tederis commented Jul 1, 2024

Are there any updates or foreseeable plans to continue this? Element data optimizations would be huge for MTA. @tederis

As was already mentioned above, most of the suggestions cannot be addressed without the C++20 support. For example, one of the suggestions was using std::string instead of SString, but this would lead to the performance degradation without C++20. But there could be some workarounds that I'm going to consider. Perhaps, it would require an intermediate PR (for the string interning implementation). I'm also interested in the element data optimization, so I'm going to finish it this month.

@Fernando-A-Rocha
Copy link
Contributor

Fernando-A-Rocha commented Nov 6, 2024

@tederis
I support the addition of:

setElementDataSilent( element theElement, bool status [, string key ] )

This would allow the developer to disable on(Client)ElementDataChange event triggers for certain elements & their children, as well as specific keys.

Inspired by #3839

@Dutchman101
Copy link
Member

As was already mentioned above, most of the suggestions cannot be addressed without the C++20 support. For example, one of the suggestions was using std::string instead of SString, but this would lead to the performance degradation without C++20. But there could be some workarounds that I'm going to consider. Perhaps, it would require an intermediate PR (for the string interning implementation).

@tederis

Alright, that sounds like a good summary after looking at all passed and pending code reviews.
Let's merge this, so you can work on a follow-up PR for C++ 20 readiness and final implementations next, while most of the changes (that got in from this PR) can be stress tested during the upcoming mass-testing phase that will be organized very soon to prepare for the release of MTA 1.6.1.

@Dutchman101
Copy link
Member

Please resolve conflicts, so we can go ahead with that course of action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants