Skip to content

Conversation

jordens
Copy link
Member

@jordens jordens commented Jun 27, 2025

This addresses several long-standing issues:

  • No more eval() in decode() which is a step towards making the stack safer
  • PYON v2 now is JSON. This makes it easier for external non-Python code (e.g. our JS/TS or Rust code) to talk to sipyco/ARTIQ interfaces. The format is compatible with JSON-RPC v1 class hinting but that fact is not used, guaranteed, or tested.
  • Support for custom types (superseding PYON: add plugin support #22 and allowing pyon: Add xarray object encoding and decoding. #27 to be implemented by the user)
  • Performance improvements: for the mid-size unittest data which is very heavy on low performing special cases, the relative time is 0.9 for encode() and 0.25 for decode() (lower than one is faster than before). For the large twitter.json dataset which is pure JSON the relative time is 0.3 for encode() and 0.07 for decode(). This has a significant impact since the ratio of decode() per encode() tends to be larger than one (in ARTIQ). encode() performance could probably be improved further with orjson.

The format is a bit more verbose, especially with pretty printing.

A PYON v1 -> v2 converter is included.

A data loss scenario in store_file() has been fixed that may have been involved in some corruption cases of the UI state files.

Several more unittests have been added.

Testing in ARTIQ is ongoing but apart from the migration and documentation I don't see a technical hurdle there. The API is backwards compatible, but since the wire format changes, it'll need some coordination.

Obviously this breaks using existing data (ARTIQ: dataset db, ui state files, devargs overrides, PYON arguments, PYON literals in experiments/CLI tools/GUI fields, HDF5 file contents, ...) or talking to sipyco interfaces running PYON v1.
Auto-migration would only be one-way, only works for files, doesn't work on the two-way sipyco interfaces, and roll-back is tricky since the user doesn't really track what's being migrated.
A transparent v1 fall-back is also only possible/helpful for decode().
One could do shadow or proxy interfaces for sipyco rpc/sync_struct etc. to ease the migration in distributed/heterogeneous deployments but that is quite a bit of work to implement and ultimately the coherent upgrade will require the same amount of user work.

jordens added 5 commits June 26, 2025 18:01
This replaces non-JSON serde with JSON-compliant wrappers.
The wrapping is fundamentally compatible with the class hinting of
JSON-RPC v1 and looks like `{"__jsonclass__": ["CONSTRUCTOR", ARGUMENTS]}`.
JSON-RPC v1 properties are ignored.
Types that are native to JSON (None, bool, float, int, str, list, Dict[str, Any])
are encoded without hints.

Also gets rid of the `eval()` and associated risks. The
intrinsic DOS attacks remain unmitigated (c.f. Python JSON warnings).

This changes the errors reported on parsing invalid PYON.

Support for several more numpy scalar data types has been added.
@jordens jordens changed the title change PYON format to be JSON compatible change PYON format to be JSON Jun 27, 2025
@dnadlinger
Copy link
Collaborator

dnadlinger commented Jul 7, 2025

Ouch, the lack of backwards-compatibility across RPC would make this the single most widely breaking change in the last ~8 years for us. Of course, the big advantage of having an RPC boundary between components is that it decouples their versioning and implementation, but that's out of the window if the format changes, so any such change should be carefully considered.

To illustrate, we have many shared pieces of infrastructure between experiments/labs that use sipyco.pc_rpc for communication – some of them embedded devices still running Python 3.5 and pyon from artiq.protocols –, for which a concerted upgrade is challenging as it would involve multiple labs with their own timelines, and modifying well-matured infrastructure that hasn't been touched in a long time.

As you say, "one could" implement transparent fallback for RPC as well – in my opinion, this is the only sensible option, and this change should be rolled back until then.

@jbqubit
Copy link

jbqubit commented Jul 7, 2025

I agree with @dnadlinger that transparent fallback to the original protocol is important.

@jordens
Copy link
Member Author

jordens commented Jul 7, 2025

To set a couple things straight: Python 3.5 has been EOL for five years. Sipyco broke compat with Python 3.6 three years ago already. Merging this doesn't impact anyone yet as no downstream user has switched.

It's unreasonable to demand continued support for pieces of postulated, untestable, and proprietary abandonware from eight years ago (kudos for the euphemism "well-matured infrastructure"). It's disproportionately unfair to all other current and future users who need the performance, safety, interoperability, and flexibility benefits this brings and have invested into an agile and maintainable infrastructure for everyone's benefit. It shifts the burden of your own local infrastructural idiosyncracies and inflexibility upstream to future development that would need to maintain v1 interfaces/v1-v2 proxies.

Conversely it seems plausible that a lab that has chosen to set up and maintain such an infrastructure also has educated and resourceful expert users well versed in maintaining their own forks and able to choose between any of the other options of (a) reducing their code baggage and organizing an upgrade of their legacy applications (b) adding to their ARTIQ patch set to stick with PYON v1, or (c) implement and maintain a v1-v2 proxy.
If you are unable to touch, wrap, or maintain your own infrastructure you are guaranteed to become terribly stuck eventually and unable to follow.

I'll think about a proxy or keeping a v1 client option but I have zero love for it. You need to be aware that you are likely the only one truly desiring, testing, maintaining, and depending on it in such a way. I suspect almost everyone else demanding it doesn't have a case. And I can see no reason why even you would need a transparent fallback. What you describe is served with running a pc_rpc proxy or setting a pc_rpc client option. And I see no need to revert this PR. The transparent fallback I described is already in the ARTIQ PR.

@dnadlinger
Copy link
Collaborator

dnadlinger commented Jul 8, 2025

As the former maintainer of an infrastructure-type open-source project – probably considerably more widely used than ARTIQ –, I do believe I understand the tension between agility/avoidance of technical debt on the developer side and backwards-compatibility on the user side the quite well. Accordingly, for mostly self-contained, even somewhat gratuitous changes in storage formats, APIs, etc., I had zero complaints even if they caused code churn for our local tree – for which I insist on the diff being kept minimal for exactly that reason –, or if they needed special precaution in libraries like ndscan that need to work across a range of ARTIQ versions.

However, this change is uniquely cross-cutting in that it "virally" requires every single component in the distributed system to be upgraded at the same time, in a way that's neither forward or backward-compatible. You will be aware that one can't load two versions of a given package (sipyco) into the same Python process.

Contrary to what you are insinuating, it is widely recognised that such "all-or-nothing upgrades" can be an infeasible proposition even in the most professionally-run distributed systems; see e.g. the support of many messaging libraries (Protobuf/Thrift/…) for schema evolution features such as optional fields. Or take most mature, widely-used library/infrastructure-type projects out there as an example (such as Python itself): backwards-incompatible changes go through a carefully considered phase of deprecations before eventual removal. In situations where a change absolutely can't be made in a way that allows a graceful overlap period of coexistence (unavoidable name collisions, etc.), people tend to go to great lengths to provide a smooth migration path (even making such undesirable compromises like deprecating A in favour of B only to then rename B back to A with another deprecation period). In contrast, this attempt doesn’t seem particularly thought through, beyond “that is quite a bit of work” and “I have zero love for it”. Welcome to the fun world of working on software that has a non-trivial user base who interact with it on a source-code level.

I do understand the benefits of infrastructure agility. Most things here are on reproducible, easily modified environments through Nix or Poetry (-> uv), precisely so we can thoroughly test and roll out new patches quickly. But this makes an all-at-once migration only marginally more feasible. Changing the RPC protocol while deliberately avoiding any chance for compatibility forces a complete split of the universe into a before and after (with all the planning issues in terms of lab schedules, and even discounting that, annoyances with external dependencies, when bisecting issues across the change, momentarily downgrading for one experiment but not another), for no reason other than momentary convenience. Keep supporting both seamlessly for a major release, deprecate old PYON in the next, and remove it in the one after.

And I can see no reason why even you would need a transparent fallback.

Because users might end up wanting to mix old and new code in the same Python process/environment – they might be consuming drivers/libraries/… from various sources, some of which are aware of the change and some of which are not. Requiring an extra option to be passed or renamed function to be called to achieve backwards compatibility is the worst of both worlds from this perspective. Having the new implementations in sipyco.{pyon,pc_rpc}_v2, or behind a protocol=2 flag or similar would be the preferred "polarity". This way, only code that is aware of the new protocol version needs to, and can appropriately, deal with potential incompatibilities, such that client code can be incrementally migrated over even if you wanted to avoid having a transparent fallback.

And I see no need to revert this PR. The transparent fallback I described is already in the ARTIQ PR.

I think it is clear that this PR falls short, and all but blocks anybody from tracking and testing ARTIQ master outside the smallest, most monolithic environments. Just backing it out until you’ve found a way to address the compatibility issue is the obvious way to go, just like when breaking accidentally breaking master in the good old days before automated pre-commit-checks. I get that you are excited about https://github.com/quartiq/vscode-artiq et al. (and moving from almost-JSON to actually-JSON is a welcome change in the abstract!), but the wider ARTIQ community shouldn’t be left to pick up the pieces.

@jordens jordens deleted the json branch July 8, 2025 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants