-
-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Element data optimizations #3287
base: master
Are you sure you want to change the base?
Conversation
Fun fact: 80% of the |
lol. I would understand it for an event that can be canceled, but not so much for an event that is not cancellable (like this one). Is there any way to optimize this? Just wondering.. |
@@ -279,12 +279,10 @@ void CClientEntity::SetID(ElementID ID) | |||
} | |||
} | |||
|
|||
CLuaArgument* CClientEntity::GetCustomData(const char* szName, bool bInheritData, bool* pbIsSynced) | |||
CLuaArgument* CClientEntity::GetCustomData(const SString& strName, bool bInheritData, bool* pbIsSynced) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are truing to not use SString in new code. Discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know. But in this partucular casу SString
was chosen for the reason that it was already used in almost all associative containers. By using lets say std::string
you would get an additional string copy (see SString(const std::string& strText
) constructor) and potential problems with key mathching. So the usage of std::string would exceed the boundaries of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use std::string_view
all the way, and only in the end (when inserting) create the SString
?
This way for the getter you can get away without creating a string at all.
Also, since the code limits the string length to 128, you could use some kind of a fixed-length string and store the data directly inside it (So Caching works better).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string_view uses 8 bytes.
Pointers and (I believe) references use 4 bytes.
So function use less registers. And the code can be optimized better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not allocating any memory until absolutely necessary far outweighs using 2 registers instead of 1.......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you so against
string_view
I don't understand.Im not totally against
string_view
. If you can pass simpler value, you should use it. Basic string compare function doesn't require string length. It's similar to removing unused data from function arguments.
std::hash
struct requires string len. That isstd::unordered_map
will have to construct a new string from the characters buffer(const char*) which is obviously not the best behaviour.std::string_view
does not have this drawback.If you use std::map you can use take advantage of transparent lookup. Sadly unordered_map doesn't support this. Another way of doing this would be to use a fixed length string buffer as the key to avoid heap allocation (and by using an
unodered_map
we could increase cache locality)
Heterogeneous lookups for unordered maps should be available since C++20. Regretfully we're still with C++17.
As you must be aware we are using both (unordered hash and tree-based maps) in MTA. Thus the solution must be selected according to the specifics of the both. Fixed length strings cannot be efficient here due to the unpredictable length of such strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you must be aware we are using both (unordered hash and tree-based maps) in MTA. Thus the solution must be selected according to the specifics of the both. Fixed length strings cannot be efficient here due to the unpredictable length of such strings.
See MAX_CUSTOMDATA_NAME_LENGTH
.
So no name can be longer than 128 chars, so a 128 + 1 byte buffer would be enough.
A cache line is typically 64B, so keep that in mind. (But I think 128 + 1 for the char buffer, and uint8 for the length should be enough, this way the whole struct would be 132 bytes (I think, or 136 on 64b).
I'm unsure why you think fixed length strings would be inefficient. Tree based maps have extreme memory overhead for small values (std::string + CLuaArgument is barely a few bytes).
[Also, memory footprint can be lowered by merging my CLuaArgument refactor that uses std::variant
(Thus the final size is quite a bit less)]
Heterogeneous lookups for unordered maps should be available since C++20. Regretfully we're still with C++17
So we can upgrade to c++20 (For the server/client dm project at least).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we could experiment with custom hash functions (that generate less collisions for our use-case), if we decide to use a hashmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See MAX_CUSTOMDATA_NAME_LENGTH.
So no name can be longer than 128 chars, so a 128 + 1 byte buffer would be enough.
A cache line is typically 64B, so keep that in mind. (But I think 128 + 1 for the char buffer, and uint8 for the length should be enough, this way the whole struct would be 132 bytes (I think, or 136 on 64b).
I'm unsure why you think fixed length strings would be inefficient. Tree based maps have extreme memory overhead for small values (std::string + CLuaArgument is barely a few bytes).
[Also, memory footprint can be lowered by merging my CLuaArgument refactor that uses std::variant (Thus the final size is quite a bit less)]
128 bytes is still a lot actually. On large servers such keys can eat up to 10 megabytes(or even more) of the ram. You might say that 10 mb does not deserve attention, but on client side it may be significant (especially on high-load custom maps).
Personally I would prefer the StringName(as like in Godot Engine, or FName in UE) approach that implies just one buffer for each unique string. This way we only need a unique number that can be used as a key within the maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an awesome idea, never thought of that, you're a genius!
So we could both save RAM, and CPU!
Just have to have a string interning system (That is global to all elements).
We also need to use some kind of reference counting (shared_ptr
perhaps? Though the control block is allocated separately, that's kinda a waste of memory in our case) - To free unused strings.
Not sure how good of a key the serial index of the string would be, in terms of load factor (we want to avoid list traversals of course).
Not sure how efficient the same interning would be for CLuaArgument
.
Though, I do have a concern, we currently use case-sensitive strings, right? FName
's are case-insensitive (which makes sense, less variation)
Yes, events can be slightly optimized. But to get a noticeable performance gain there would need a lot of work(probably, a full reengineering of the event system). |
I think function |
Apart from Lua events (see notes about their performance above), the element data is as fast as possible now. |
In theory |
better to just add argument to setElementData that tells if it should trigger an event |
These days we have a lot of memory in our computers. 16GB is the minimum for new pcs in few last years, 8GB is still common. I think people can handle that really. Lets not be so paranoic. Main problem of element-datas are their CPU usage. |
This brings up a question: what should we do when a client sends data to a server? Send this flag as well and dictate whether server should invoke an event? It can produce the following circle: client -> server -> other clients. One of the clients may decide to silently change the data without even possibility for detecting it at endpoints. Or we shouldn't allow clients do that? |
But you are aware that MTA/GTA is only 32 bit? |
IMO this should definitely be an additional argument to
This can be also solved by #3286 😎 |
|
Although this PR is fully functional I'm forced to admit that most of the new suggestions cannot be addressed without the C++20 support (see the discussion in comments above). A brief attempt to enable C++20 shows that a lot of things(even the singular deathmatch module) depend on previous standards. So there is many things that should be resolved before. I hope that work will be done in the foreseeable future. |
Am i correct that we can save a lot of performance if we just delete all "onClientElementDataChange" event handlers in all running scripts? I use this event a lot in my old WW2 server because its very handy but the performance is terrible, even on a Ryzen 5900X. |
Events are the heavier part of the element data. Thus, we can get a significant performance gain by disabling them. My original idea was to add setElementDataSilence(element, key, silent) function for this. If you're asking if you can get that gain right now, then yes. The less event handles you have the less they impact the performance. |
Are there any updates or foreseeable plans to continue this? Element data optimizations would be huge for MTA. |
As was already mentioned above, most of the suggestions cannot be addressed without the C++20 support. For example, one of the suggestions was using std::string instead of SString, but this would lead to the performance degradation without C++20. But there could be some workarounds that I'm going to consider. Perhaps, it would require an intermediate PR (for the string interning implementation). I'm also interested in the element data optimization, so I'm going to finish it this month. |
Alright, that sounds like a good summary after looking at all passed and pending code reviews. |
Please resolve conflicts, so we can go ahead with that course of action. |
This PR is intended to improve the element data performance. The current implementation is far too inefficient because of a lot of string copying and the scourge of strlen invocations. There is also an excessive amount of access to std::map which makes it worse. Prior to the PR each setElementData call was producing at least 5 string copies on client and at least 7 copies(and a multiple of player count in worst case when using an element data subscription) on server side. There were also 14 strlen invocations(each has O(n) complexity).
Among the other improvements an additional ElementData's value check(newValue != oldValue) in method Packet_CustomData which can protect a server from PACKET_ID_CUSTOM_DATA packets spamming and the same for CElementRPCs::SetElementData method which also can protect a client from erroneous changes.
Simple(preliminary) measurements(local set/getElementData) show the performance gain nearly 2(in some cases 5, depends on an element data value) times.