Element data optimizations #3287

tederis · 2024-01-10T09:16:29Z

This PR is intended to improve the element data performance. The current implementation is far too inefficient because of a lot of string copying and the scourge of strlen invocations. There is also an excessive amount of access to std::map which makes it worse. Prior to the PR each setElementData call was producing at least 5 string copies on client and at least 7 copies(and a multiple of player count in worst case when using an element data subscription) on server side. There were also 14 strlen invocations(each has O(n) complexity).

Among the other improvements an additional ElementData's value check(newValue != oldValue) in method Packet_CustomData which can protect a server from PACKET_ID_CUSTOM_DATA packets spamming and the same for CElementRPCs::SetElementData method which also can protect a client from erroneous changes.

Simple(preliminary) measurements(local set/getElementData) show the performance gain nearly 2(in some cases 5, depends on an element data value) times.

tederis · 2024-01-10T14:06:18Z

Fun fact: 80% of the setElementData execution time is on(Client)ElementDataChange event processing with just one simple event handler.

TriangleToo · 2024-01-10T16:31:51Z

Fun fact: 80% of the setElementData execution time is on(Client)ElementDataChange event processing with just one simple event handler.

lol. I would understand it for an event that can be canceled, but not so much for an event that is not cancellable (like this one). Is there any way to optimize this? Just wondering..

Client/mods/deathmatch/logic/CClientEntity.cpp

TheNormalnij · 2024-01-10T17:03:44Z

Client/mods/deathmatch/logic/CClientEntity.cpp

@@ -279,12 +279,10 @@ void CClientEntity::SetID(ElementID ID)
    }
 }

-CLuaArgument* CClientEntity::GetCustomData(const char* szName, bool bInheritData, bool* pbIsSynced)
+CLuaArgument* CClientEntity::GetCustomData(const SString& strName, bool bInheritData, bool* pbIsSynced)


We are truing to not use SString in new code. Discussion

Yes, I know. But in this partucular casу SString was chosen for the reason that it was already used in almost all associative containers. By using lets say std::string you would get an additional string copy (see SString(const std::string& strText) constructor) and potential problems with key mathching. So the usage of std::string would exceed the boundaries of this PR.

Why not use std::string_view all the way, and only in the end (when inserting) create the SString?
This way for the getter you can get away without creating a string at all.
Also, since the code limits the string length to 128, you could use some kind of a fixed-length string and store the data directly inside it (So Caching works better).

std::string_view uses 8 bytes.
Pointers and (I believe) references use 4 bytes.
So function use less registers. And the code can be optimized better.

I think not allocating any memory until absolutely necessary far outweighs using 2 registers instead of 1.......

Why are you so against string_view I don't understand.

Im not totally against string_view. If you can pass simpler value, you should use it. Basic string compare function doesn't require string length. It's similar to removing unused data from function arguments.

std::hash struct requires string len. That is std::unordered_map will have to construct a new string from the characters buffer(const char*) which is obviously not the best behaviour. std::string_view does not have this drawback.

If you use std::map you can use take advantage of transparent lookup. Sadly unordered_map doesn't support this. Another way of doing this would be to use a fixed length string buffer as the key to avoid heap allocation (and by using an unodered_map we could increase cache locality)

Heterogeneous lookups for unordered maps should be available since C++20. Regretfully we're still with C++17.

As you must be aware we are using both (unordered hash and tree-based maps) in MTA. Thus the solution must be selected according to the specifics of the both. Fixed length strings cannot be efficient here due to the unpredictable length of such strings.

As you must be aware we are using both (unordered hash and tree-based maps) in MTA. Thus the solution must be selected according to the specifics of the both. Fixed length strings cannot be efficient here due to the unpredictable length of such strings.

See MAX_CUSTOMDATA_NAME_LENGTH.
So no name can be longer than 128 chars, so a 128 + 1 byte buffer would be enough.
A cache line is typically 64B, so keep that in mind. (But I think 128 + 1 for the char buffer, and uint8 for the length should be enough, this way the whole struct would be 132 bytes (I think, or 136 on 64b).
I'm unsure why you think fixed length strings would be inefficient. Tree based maps have extreme memory overhead for small values (std::string + CLuaArgument is barely a few bytes).
[Also, memory footprint can be lowered by merging my CLuaArgument refactor that uses std::variant (Thus the final size is quite a bit less)]

Heterogeneous lookups for unordered maps should be available since C++20. Regretfully we're still with C++17

So we can upgrade to c++20 (For the server/client dm project at least).

Also, we could experiment with custom hash functions (that generate less collisions for our use-case), if we decide to use a hashmap.

See MAX_CUSTOMDATA_NAME_LENGTH.
So no name can be longer than 128 chars, so a 128 + 1 byte buffer would be enough.
A cache line is typically 64B, so keep that in mind. (But I think 128 + 1 for the char buffer, and uint8 for the length should be enough, this way the whole struct would be 132 bytes (I think, or 136 on 64b).
I'm unsure why you think fixed length strings would be inefficient. Tree based maps have extreme memory overhead for small values (std::string + CLuaArgument is barely a few bytes).
[Also, memory footprint can be lowered by merging my CLuaArgument refactor that uses std::variant (Thus the final size is quite a bit less)]

128 bytes is still a lot actually. On large servers such keys can eat up to 10 megabytes(or even more) of the ram. You might say that 10 mb does not deserve attention, but on client side it may be significant (especially on high-load custom maps).

Personally I would prefer the StringName(as like in Godot Engine, or FName in UE) approach that implies just one buffer for each unique string. This way we only need a unique number that can be used as a key within the maps.

That is an awesome idea, never thought of that, you're a genius!
So we could both save RAM, and CPU!
Just have to have a string interning system (That is global to all elements).
We also need to use some kind of reference counting (shared_ptr perhaps? Though the control block is allocated separately, that's kinda a waste of memory in our case) - To free unused strings.

Not sure how good of a key the serial index of the string would be, in terms of load factor (we want to avoid list traversals of course).

Not sure how efficient the same interning would be for CLuaArgument.

Though, I do have a concern, we currently use case-sensitive strings, right? FName's are case-insensitive (which makes sense, less variation)

tederis · 2024-01-11T03:26:18Z

Fun fact: 80% of the setElementData execution time is on(Client)ElementDataChange event processing with just one simple event handler.

lol. I would understand it for an event that can be canceled, but not so much for an event that is not cancellable (like this one). Is there any way to optimize this? Just wondering..

Yes, events can be slightly optimized. But to get a noticeable performance gain there would need a lot of work(probably, a full reengineering of the event system).

tederis · 2024-01-11T04:26:26Z

I think function setElementDataSilence(element, key, silent)(suggest the better name) can be added for blocking element data events. Because in most of the cases those events are not needed but they still consume a lot of CPU.

This reverts commit 347f9cc.

…asa-blue into element_data_optimize

This reverts commit 69b5960.

tederis · 2024-01-12T15:18:12Z

Apart from Lua events (see notes about their performance above), the element data is as fast as possible now.

tederis · 2024-01-12T15:58:54Z

In theory google::dense_hash_map may give even better results but I'm afraid it will consume too much memory

Allerek · 2024-01-12T16:15:59Z

I think function setElementDataSilence(element, key, silent)(suggest the better name) can be added for blocking element data events. Because in most of the cases those events are not needed but they still consume a lot of CPU.

better to just add argument to setElementData that tells if it should trigger an event

Allerek · 2024-01-12T16:17:19Z

In theory google::dense_hash_map may give even better results but I'm afraid it will consume too much memory

These days we have a lot of memory in our computers. 16GB is the minimum for new pcs in few last years, 8GB is still common. I think people can handle that really. Lets not be so paranoic. Main problem of element-datas are their CPU usage.

tederis · 2024-01-12T16:34:57Z

I think function setElementDataSilence(element, key, silent)(suggest the better name) can be added for blocking element data events. Because in most of the cases those events are not needed but they still consume a lot of CPU.

better to just add argument to setElementData that tells if it should trigger an event

This brings up a question: what should we do when a client sends data to a server? Send this flag as well and dictate whether server should invoke an event? It can produce the following circle: client -> server -> other clients. One of the clients may decide to silently change the data without even possibility for detecting it at endpoints. Or we shouldn't allow clients do that?

sacr1ficez · 2024-01-12T16:43:28Z

These days we have a lot of memory in our computers. 16GB is the minimum for new pcs in few last years, 8GB is still common. I think people can handle that really. Lets not be so paranoic. Main problem of element-datas are their CPU usage.

But you are aware that MTA/GTA is only 32 bit?

TriangleToo · 2024-01-12T21:24:25Z

I think function setElementDataSilence(element, key, silent)(suggest the better name) can be added for blocking element data events. Because in most of the cases those events are not needed but they still consume a lot of CPU.

better to just add argument to setElementData that tells if it should trigger an event

This brings up a question: what should we do when a client sends data to a server? Send this flag as well and dictate whether server should invoke an event? It can produce the following circle: client -> server -> other clients. One of the clients may decide to silently change the data without even possibility for detecting it at endpoints. Or we shouldn't allow clients do that?

IMO this should definitely be an additional argument to setElementData instead of another function. It makes no sense to have it as a separate function (at least none that I can think of). The client should have this argument ignored when syncMode is set to true (synced with server) and same on the server logic - it should ignore such flag if sent from client.

One of the clients may decide to silently change the data without even possibility for detecting it at endpoints

This can be also solved by #3286 😎

Pirulax · 2024-01-14T23:12:40Z

newValue != oldValue doesn't work all that well.
Checking whenever 2 LuaArguments are the same isn't trivial, and the implementation is currently very shallow.
So keep that in mind.

tederis · 2024-01-31T06:00:13Z

Although this PR is fully functional I'm forced to admit that most of the new suggestions cannot be addressed without the C++20 support (see the discussion in comments above). A brief attempt to enable C++20 shows that a lot of things(even the singular deathmatch module) depend on previous standards. So there is many things that should be resolved before. I hope that work will be done in the foreseeable future.

Einheit-101 · 2024-02-01T01:19:42Z

Am i correct that we can save a lot of performance if we just delete all "onClientElementDataChange" event handlers in all running scripts? I use this event a lot in my old WW2 server because its very handy but the performance is terrible, even on a Ryzen 5900X.

tederis · 2024-02-01T02:01:16Z

Am i correct that we can save a lot of performance if we just delete all "onClientElementDataChange" event handlers in all running scripts? I use this event a lot in my old WW2 server because its very handy but the performance is terrible, even on a Ryzen 5900X.

Events are the heavier part of the element data. Thus, we can get a significant performance gain by disabling them. My original idea was to add setElementDataSilence(element, key, silent) function for this. If you're asking if you can get that gain right now, then yes. The less event handles you have the less they impact the performance.

Dutchman101 · 2024-06-30T03:46:54Z

Are there any updates or foreseeable plans to continue this? Element data optimizations would be huge for MTA.
@tederis

tederis · 2024-07-01T05:56:06Z

Are there any updates or foreseeable plans to continue this? Element data optimizations would be huge for MTA. @tederis

As was already mentioned above, most of the suggestions cannot be addressed without the C++20 support. For example, one of the suggestions was using std::string instead of SString, but this would lead to the performance degradation without C++20. But there could be some workarounds that I'm going to consider. Perhaps, it would require an intermediate PR (for the string interning implementation). I'm also interested in the element data optimization, so I'm going to finish it this month.

Fernando-A-Rocha · 2024-11-06T11:36:18Z

@tederis
I support the addition of:

setElementDataSilent( element theElement, bool status [, string key ] )

This would allow the developer to disable on(Client)ElementDataChange event triggers for certain elements & their children, as well as specific keys.

Inspired by #3839

Dutchman101 · 2025-03-01T13:37:05Z

As was already mentioned above, most of the suggestions cannot be addressed without the C++20 support. For example, one of the suggestions was using std::string instead of SString, but this would lead to the performance degradation without C++20. But there could be some workarounds that I'm going to consider. Perhaps, it would require an intermediate PR (for the string interning implementation).

@tederis

Alright, that sounds like a good summary after looking at all passed and pending code reviews.
Let's merge this, so you can work on a follow-up PR for C++ 20 readiness and final implementations next, while most of the changes (that got in from this PR) can be stress tested during the upcoming mass-testing phase that will be organized very soon to prepare for the release of MTA 1.6.1.

Dutchman101 · 2025-03-01T13:37:35Z

Please resolve conflicts, so we can go ahead with that course of action.

tederis · 2025-04-19T09:38:17Z

This PR will be reimplemented and reopened soon.

Element data optimizations

c50fd5f

tederis added enhancement New feature or request refactor labels Jan 10, 2024

tederis added 2 commits January 10, 2024 16:41

Fix misprint

cbfed48

Merge branch 'master' into element_data_optimize

206d49e

TheNormalnij reviewed Jan 10, 2024

View reviewed changes

Client/mods/deathmatch/logic/CClientEntity.cpp Show resolved Hide resolved

TheNormalnij reviewed Jan 10, 2024

View reviewed changes

tederis added 5 commits January 12, 2024 21:57

Further optimizations

347f9cc

Revert "Further optimizations"

69b5960

This reverts commit 347f9cc.

Merge branch 'element_data_optimize' of https://github.com/tederis/mt…

02c8fd9

…asa-blue into element_data_optimize

Revert "Revert "Further optimizations""

5ee515a

This reverts commit 69b5960.

Remove CStaticFunctionDefinitions::GetElementData

cd554b6

Use weak argument for the data deletion

b22af37

Merge branch 'master' into element_data_optimize

9c560a9

Dutchman101 and others added 2 commits January 18, 2024 23:24

Merge branch 'master' into element_data_optimize

5ccc413

Merge branch 'master' into element_data_optimize

0ce4587

tederis and others added 2 commits June 19, 2024 10:56

Merge branch 'master' into element_data_optimize

0e257a0

Merge branch 'master' into element_data_optimize

14eafb4

tederis mentioned this pull request Oct 7, 2024

Add C++20 support for Deathmatch project #3745

Closed

tederis closed this Apr 19, 2025

tederis mentioned this pull request Apr 19, 2025

Add StringName #4169

Merged

Uh oh!

Element data optimizations #3287

Element data optimizations #3287

Uh oh!

Conversation

tederis commented Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tederis commented Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TriangleToo commented Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

TheNormalnij Jan 10, 2024

Choose a reason for hiding this comment

Uh oh!

tederis Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pirulax Jan 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheNormalnij Jan 14, 2024

Choose a reason for hiding this comment

Uh oh!

Pirulax Jan 15, 2024

Choose a reason for hiding this comment

Uh oh!

tederis Jan 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pirulax Jan 19, 2024

Choose a reason for hiding this comment

Uh oh!

Pirulax Jan 19, 2024

Choose a reason for hiding this comment

Uh oh!

tederis Jan 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pirulax Jan 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tederis commented Jan 11, 2024

Uh oh!

tederis commented Jan 11, 2024

Uh oh!

tederis commented Jan 12, 2024

Uh oh!

tederis commented Jan 12, 2024

Uh oh!

Allerek commented Jan 12, 2024

Uh oh!

Allerek commented Jan 12, 2024

Uh oh!

tederis commented Jan 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sacr1ficez commented Jan 12, 2024

Uh oh!

TriangleToo commented Jan 12, 2024

Uh oh!

Pirulax commented Jan 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tederis commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Einheit-101 commented Feb 1, 2024

Uh oh!

tederis commented Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tederis commented Jan 10, 2024 •

edited

Loading

tederis commented Jan 10, 2024 •

edited

Loading

TriangleToo commented Jan 10, 2024 •

edited

Loading

tederis Jan 11, 2024 •

edited

Loading

Pirulax Jan 14, 2024 •

edited

Loading

tederis Jan 19, 2024 •

edited

Loading

tederis Jan 20, 2024 •

edited

Loading

Pirulax Jan 20, 2024 •

edited

Loading

tederis commented Jan 12, 2024 •

edited

Loading

Pirulax commented Jan 14, 2024 •

edited

Loading

tederis commented Jan 31, 2024 •

edited

Loading

tederis commented Feb 1, 2024 •

edited

Loading

Fernando-A-Rocha commented Nov 6, 2024 •

edited

Loading