-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: pickle shelffile -> standard_address_space.sql #763
base: master
Are you sure you want to change the base?
Conversation
Interesting work.
|
What is the point of that change? The shelves stuff was just a ugly hack for very slow systems. Btw sqlite is not typed at all. You store whatever you want whereever you want. |
@oroulet Also, writing user aspace to a pickle file is not an option for the following reasons:
Please comment on any issue! both implementation and idea's. |
I'm in support of something that we can build into address space persistence.
My only other input is that if we really go this direction it might be better to create an interface like history has. That way end user could potentially implement their own persistence layer. |
I am somewhat concerned about the divergence between the main branch and the parallel work that is done on the asyncio-server. I really would be able to do |
@zerox1212
That is a very interesting remark. |
I think history is better done separately. The OPC UA spec defines how history works in detail. If we are strictly keeping to the spec I do not know if it says anything about persisting server state. In the end I was just giving some inputs. I am not able to do much work on this library myself. However I can review some stuff about this topic because I was one of the people working on shelf, dump, and load because I needed to restore server state. I think in the end I actually used XML export/import for persistence. |
Maybe I'm wrong, but why not (optionally) storing user aspace in the same sql file as history? |
Because history SQL implementation is rather simplistic and is not a good model to build on. It makes one table per historized node. You would not want this for potentially thousands of nodes. Only nodes with history require this design. Plus if node names change or other such things data is orphaned in the database because there is not really a mechanism to remove data. |
opcua-asyncio is a python 3.6 only fork so we do not care about python2 there. I think the approach is correct but yest I am concerned about divergence, that is why I thing, large restructure work should wait for that work to stabilize. better work on improving opcua-asyncio than python-opcua concerning the shelve stuff I thing the people wanting to save things in the opcua database have a fundamental misunderstanding of what opcua is. opua is a communication protocol, saving things inside the communication protocol does not make sense at all, the underlying system should save its state, opcua is just here to expose data from the underlying system. I am wondering if I made some errors in the API that make people think they should do that kind of things.... but I agree that using a sqlite db to store the standard address space might have an interest, this standard address space slows down starting time and uses memory for something that is ignored 99.99 % of the time. But onece again this should be something to implement in asyncio-opcua @brubbel if you have some time or want to learn some modern python things, then look at the fork, try it and try to improve it. We also need to make a sync wrapper around it, there are a lot of things to things about here. probably make some classes with decorator and maybe the async Node should be called AsyncNode and the syn node Node to stay compatible with python-opcua... or just accept that we not completly API compatible anymore |
I agree that opcua-ua is not meant as storage space. However, It is my opinion that the library must provide a simple means to override some parts such as the address space. With that in mind I think it is a good idea to support a strategy for that. I would like to start using the opcua-asyncio for the following reasons:
However:
One other remark: |
I agree that this could be an improvment, if you do that you need to provide some API to notify the system for change in values so datachange events can be generated. Such a change will require quite a lot fo work
I have no idea 😢 I haven't had time to try it t but I looked at the changes and they all looked correct, |
@oroulet said OPC UA is not really meant for persistence (except history I guess...). For me it's still a valid request to want persistence of user nodes, mainly because it can make prototyping UA software with this library faster. In other words I'm lazy and I don't want to write a persisting data layer and then create a UA address space that will basically be the exact same thing. Maybe my use case is just too special. An address space design that could can be overridden would be a nice option so user like me can break the rules. :) |
So here it is, based on your comments:
Of course this address space is empty, but you can load from sqlite:
Now you can start to populate the address space with your own nodes, as usual.
(Don't use dump() regularly however, it is slow. Still need to implement on-the-fly writes)
@oroulet said:
In the proposed implementation, you still read and write to the server via the usual set/get_value API, so all is fine. @zerox1212 said
Just inherit from The current |
I haven't looked at this closely yet, but one more thing is that you should be really carefull with performance, this is an arear which we had to really optimized. I see that your replaced several dicts access with functions, this might slow downn things quite a lot for example. It was also optimized many places to avoid searching dicts. |
True, the first read will be slow as data is fetched from disk. After that there should be no difference. Also, the current implementation is synchronous and the async-opcua could really shine here as you can continue serving/processing other tasks while waiting for IO, or you could even restore cache during idle time. |
Note to self: generate_nodeid may be a problem. (Did it ever work with the shelf? It looks only in memory if I'm correct.) python-opcua/opcua/server/address_space.py Lines 524 to 527 in 5ccf372
|
10f7a03
to
74e7cb3
Compare
TL;DR; address space Usage:
How it works:
Data becomes persistent by:
Advantages.
Speed |
Looks nice. Let's wait for @oroulet to review. If you don't define Only small points I see are that the |
Correct, everything remains as before when the user does not define aspace. That means no sqlite involved and standard address space re-generated at startup. Only the pre-generated standard_address_space.sql is added to the repo. My idea is to let the [WIP] hanging for a while until I've figured out most of the probable ;-) bugs and optimizations, but please try and comment. |
Added a shortcut for loading the standard address space.
|
Added examples/server-persistent.py
|
The idea of putting the standard address space in a sqlite db is not bad, that is fine, but I hada a few questions:
|
so maybe you should read that value at startup and regenerate address space if current addresse space is different than the one in code? |
Wouldn't that require to first load the standard_address_space_partx.py files? Which is the main reason to use standard_address_space.sql on slow devices? Or maybe you meant not to clutter the git repository with standard_address_space.sql blob updates which are basically unchanged if the version stays the same? |
I may forget something, but my proposition was if running sql |
oops I did not see that you put a huge a blob file in repository. Yes this is a very bad idea ;-). |
as written earlier if you manage to implement that correctly I thing we should considere using sql as default to reduse startup time. But then we really need to make sure it does not give and performance drawback for high speed read_write on client and server side. |
How would you make the link to GUID for nodes that are only on disk? All things I have been circling around boil down to a unique identifier, or a hash thereof (which may introduce hash collisions if too short). One needs something unique identifiable, so in the end why not use the opc-ua well-defined binary nodeid representation, preferrably numeric (if possible) to capture numeric/twobyte/fourbyte as being the same node. That is also what is (implicitly) done when calculating the python hash, because it ignores the NodeIdType. |
Could we look for separate library that will hash in a predictable way i.e. not salt the hash? |
Have been looking at fast hash algorithms like fnv-1a or murmur, but the only thing that they add is a risk for hash collisions and an extra dependency, and no way to calculate backwards to the original nodeid. If you have a lot of nodes, the risk of a collision increases dramatically, unless you go for hash sizes larger than the binary representation of the nodeid, so no gain there. 32-bit hash -> 1% collision rate for 10k nodes |
Even worse: |
SQLite3 needs write access to read-only standard_address_space.sql file. Added the Py2.x users can't use the wrapper:
But will have to copy the standard_address_space.sql and do:
where /path/to/standard_address_space.sql is writable and also in a directory with write access. |
multithreading: sqlite does not allow interleaving write+commits between multiple threads. |
cc0708f
to
447699f
Compare
Rebased PR to master. |
Refactoring the shelffile from a monolithic blob to sqlite3 db. Advantages over pickle: - Fast - Small (25% of fill_address_space code) - Not dependent on Python2/3 version - Supports transactional read/(write TODO) - Supports real-time persistence of user address-space (TODO) - Strong typed: only INTEGER, TEXT and opc-ua to_binary() BLOB - Drop-in replacement for memory-based AddressSpace Built-in integrity check on generate_address_space.py dump.
Multiprocessing causes locking error when write and commits from multiple threads are interleaved. Make sure that commits are performed under the same Lock() session.
Since the sqlite address-space is lazy loading, the old approach with incremental identifiers is very slow, as it requires loading the complete address space in memory before the highest identifier is found. Generating a random uid and checking if it already exists in the address space is much faster, especially for hunderds of nodes.
d9f21b4
to
6007393
Compare
Rebased PR to master. |
Refactoring the shelffile from a monolithic blob to sqlite3 db.
Advantages over pickle:
Built-in integrity check on generate_address_space.py dump.
Usage: shelffile=True when starting server.