-
Notifications
You must be signed in to change notification settings - Fork 30
change PYON format to be JSON #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This replaces non-JSON serde with JSON-compliant wrappers. The wrapping is fundamentally compatible with the class hinting of JSON-RPC v1 and looks like `{"__jsonclass__": ["CONSTRUCTOR", ARGUMENTS]}`. JSON-RPC v1 properties are ignored. Types that are native to JSON (None, bool, float, int, str, list, Dict[str, Any]) are encoded without hints. Also gets rid of the `eval()` and associated risks. The intrinsic DOS attacks remain unmitigated (c.f. Python JSON warnings). This changes the errors reported on parsing invalid PYON. Support for several more numpy scalar data types has been added.
Ouch, the lack of backwards-compatibility across RPC would make this the single most widely breaking change in the last ~8 years for us. Of course, the big advantage of having an RPC boundary between components is that it decouples their versioning and implementation, but that's out of the window if the format changes, so any such change should be carefully considered. To illustrate, we have many shared pieces of infrastructure between experiments/labs that use As you say, "one could" implement transparent fallback for RPC as well – in my opinion, this is the only sensible option, and this change should be rolled back until then. |
I agree with @dnadlinger that transparent fallback to the original protocol is important. |
To set a couple things straight: Python 3.5 has been EOL for five years. Sipyco broke compat with Python 3.6 three years ago already. Merging this doesn't impact anyone yet as no downstream user has switched. It's unreasonable to demand continued support for pieces of postulated, untestable, and proprietary abandonware from eight years ago (kudos for the euphemism "well-matured infrastructure"). It's disproportionately unfair to all other current and future users who need the performance, safety, interoperability, and flexibility benefits this brings and have invested into an agile and maintainable infrastructure for everyone's benefit. It shifts the burden of your own local infrastructural idiosyncracies and inflexibility upstream to future development that would need to maintain v1 interfaces/v1-v2 proxies. Conversely it seems plausible that a lab that has chosen to set up and maintain such an infrastructure also has educated and resourceful expert users well versed in maintaining their own forks and able to choose between any of the other options of (a) reducing their code baggage and organizing an upgrade of their legacy applications (b) adding to their ARTIQ patch set to stick with PYON v1, or (c) implement and maintain a v1-v2 proxy. I'll think about a proxy or keeping a v1 client option but I have zero love for it. You need to be aware that you are likely the only one truly desiring, testing, maintaining, and depending on it in such a way. I suspect almost everyone else demanding it doesn't have a case. And I can see no reason why even you would need a transparent fallback. What you describe is served with running a pc_rpc proxy or setting a pc_rpc client option. And I see no need to revert this PR. The transparent fallback I described is already in the ARTIQ PR. |
As the former maintainer of an infrastructure-type open-source project – probably considerably more widely used than ARTIQ –, I do believe I understand the tension between agility/avoidance of technical debt on the developer side and backwards-compatibility on the user side the quite well. Accordingly, for mostly self-contained, even somewhat gratuitous changes in storage formats, APIs, etc., I had zero complaints even if they caused code churn for our local tree – for which I insist on the diff being kept minimal for exactly that reason –, or if they needed special precaution in libraries like ndscan that need to work across a range of ARTIQ versions. However, this change is uniquely cross-cutting in that it "virally" requires every single component in the distributed system to be upgraded at the same time, in a way that's neither forward or backward-compatible. You will be aware that one can't load two versions of a given package (sipyco) into the same Python process. Contrary to what you are insinuating, it is widely recognised that such "all-or-nothing upgrades" can be an infeasible proposition even in the most professionally-run distributed systems; see e.g. the support of many messaging libraries (Protobuf/Thrift/…) for schema evolution features such as optional fields. Or take most mature, widely-used library/infrastructure-type projects out there as an example (such as Python itself): backwards-incompatible changes go through a carefully considered phase of deprecations before eventual removal. In situations where a change absolutely can't be made in a way that allows a graceful overlap period of coexistence (unavoidable name collisions, etc.), people tend to go to great lengths to provide a smooth migration path (even making such undesirable compromises like deprecating A in favour of B only to then rename B back to A with another deprecation period). In contrast, this attempt doesn’t seem particularly thought through, beyond “that is quite a bit of work” and “I have zero love for it”. Welcome to the fun world of working on software that has a non-trivial user base who interact with it on a source-code level. I do understand the benefits of infrastructure agility. Most things here are on reproducible, easily modified environments through Nix or Poetry (-> uv), precisely so we can thoroughly test and roll out new patches quickly. But this makes an all-at-once migration only marginally more feasible. Changing the RPC protocol while deliberately avoiding any chance for compatibility forces a complete split of the universe into a before and after (with all the planning issues in terms of lab schedules, and even discounting that, annoyances with external dependencies, when bisecting issues across the change, momentarily downgrading for one experiment but not another), for no reason other than momentary convenience. Keep supporting both seamlessly for a major release, deprecate old PYON in the next, and remove it in the one after.
Because users might end up wanting to mix old and new code in the same Python process/environment – they might be consuming drivers/libraries/… from various sources, some of which are aware of the change and some of which are not. Requiring an extra option to be passed or renamed function to be called to achieve backwards compatibility is the worst of both worlds from this perspective. Having the new implementations in
I think it is clear that this PR falls short, and all but blocks anybody from tracking and testing ARTIQ master outside the smallest, most monolithic environments. Just backing it out until you’ve found a way to address the compatibility issue is the obvious way to go, just like when breaking accidentally breaking master in the good old days before automated pre-commit-checks. I get that you are excited about https://github.com/quartiq/vscode-artiq et al. (and moving from almost-JSON to actually-JSON is a welcome change in the abstract!), but the wider ARTIQ community shouldn’t be left to pick up the pieces. |
This addresses several long-standing issues:
eval()
indecode()
which is a step towards making the stack saferencode()
and 0.25 fordecode()
(lower than one is faster than before). For the largetwitter.json
dataset which is pure JSON the relative time is 0.3 forencode()
and 0.07 fordecode()
. This has a significant impact since the ratio ofdecode()
perencode()
tends to be larger than one (in ARTIQ).encode()
performance could probably be improved further withorjson
.The format is a bit more verbose, especially with pretty printing.
A PYON v1 -> v2 converter is included.
A data loss scenario in store_file() has been fixed that may have been involved in some corruption cases of the UI state files.
Several more unittests have been added.
Testing in ARTIQ is ongoing but apart from the migration and documentation I don't see a technical hurdle there. The API is backwards compatible, but since the wire format changes, it'll need some coordination.
Obviously this breaks using existing data (ARTIQ: dataset db, ui state files, devargs overrides, PYON arguments, PYON literals in experiments/CLI tools/GUI fields, HDF5 file contents, ...) or talking to sipyco interfaces running PYON v1.
Auto-migration would only be one-way, only works for files, doesn't work on the two-way sipyco interfaces, and roll-back is tricky since the user doesn't really track what's being migrated.
A transparent v1 fall-back is also only possible/helpful for decode().
One could do shadow or proxy interfaces for sipyco rpc/sync_struct etc. to ease the migration in distributed/heterogeneous deployments but that is quite a bit of work to implement and ultimately the coherent upgrade will require the same amount of user work.