-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Current status
Currently a pyconvert rule consists of:
- the source python type
t, which the rule can convert from; - the target julia type,
T, which the rule can convert to; - the
priorityof the rule; and - the function
funcimplementing the rule.
When pyconvert(R, x) runs, it first filters the list of rules according to t and T (roughly pyisinstance(x, t) and typeintersect(R, T) != Union{}). The rules are then ordered first by priority, then by the specificity of t, then by the order the rules were defined.
The priorities are:
jlwrap: for wrapped julia objects by just unwrapping them;array: for array-like objects (buffers, numpy arrays, ...);canonical: for the canonical conversion for a type, e.g.floattoFloat64;normal: for all other reasonable conversions.
The priorities are a bit of a hack, to work around the fact that ordering by specificity of t isn't quite right. For example, we always want to convert julia objects by unwrapping them first, so we need their rules to come first, even if the object also happens to be a Mapping and we are converting to Dict, we don't want to use the generic Mapping to Dict rule. And if the object is array-like, we want to convert by getting at the underlying memory instead of using the generic Sequence to Array rule.
The proposal
So my proposal is to remove priority and add:
- the scope julia type
S, which must be a supertype ofT.
We further filter rules by S (R <: S, except if R isa Union then just one component has to match).
For ordering rules, we no longer order by priority, just by specificity of t and insertion order.
We also ignore type(x).mro() and only use strict specificity (issubclass(t1, t2)). That is, rules form a DAG with this partial ordering, which we flatten using insertion order to break ties.
You are only allowed to create rules where you "own" either t or S.
Discussion
This means you can only have S=Any if you own t. Can think of S=Any as being canonical priority or higher. PythonCall will continue to "own" the Python standard library, and most rules in PythonCall will have S=Any. The exception is for some things currently in the normal priority. For example we convert None to Nothing canonically but can also go to Missing. In the new system, the rules will have T=Nothing, S=Any and T=S=Missing, so you generically get Nothing but can get Missing if you ask for it. Similarly tuple canonically converts to Tuple but can also go to Array, the rules for which will become T=Tuple, S=Any and T=Array S=AbstractArray, so you will get an Array if you specify Array or AbstractArray.
If you don't own t, then you must own S. This lets you define e.g. a generic conversion rule for list to some new MyArray you invented. But you can only use the rule if you specify pyconvert(MyArray, x). Doing pyconvert(AbstractArray, x) or pyconvert(Any, x) will not use the rule. Hence we have well-scoped rules, avoid piracy, avoid cases where the conversion rules applied depend on which packages are loaded.
In particular, since passing Python objects to Julia in JuliaCall normally uses pyconvert(Any, x), only rules created by the "owner" of pytype(x) are applied. This makes passing Python values around predictable - some third-party package defining their list to MyArray rule will not affect how list gets passed to Julia by default.
By ignoring the MRO of the passed Python object, we ignore issues with the arbitrary ordering of types in the MRO. Our proposal guarantees that if you have an applicable rule with t=t1 then it can only be overridden by a later rule with t=t2 if t2 is a strict subclass of t1. Currently it can be overridden if t2 is completely unrelated but just happens to be higher up the MRO.
I think this scheme is sufficiently general to encode rules in the priority order users will want. When adding a rule, you must own t or S. If you own t then it will be more specific than anything else anyway. If you own S then you have to opt in to using the rule like pyconvert(S, x) in which case only your rules pass the filter. Or if you do pyconvert(Union{Foo,S}, x) then whether you get a Foo or an S depends on insertion order, but if Foo came from a parent package, then you should rightly get a Foo, which will be the case because it's rules were defined first. So basically insertion order prevents overwriting rules from earlier-loaded packages. This does mean the output type can be import-order-dependent, but only where there are unions, and this case is inherently ambiguous so we have to pick something arbitrarily anyway.
What about jlwrap and array?
We will have rules like t=juliacall.AnyValue, T=S=Any and t=<buffer>, T=PyArray, S=Any. Provided we define these first, they will be applied first unless a rule for a more specific t is defined.
Worked examples
Here are some rules for t=list:
T=PyArray, S=Any: canonical conversion to aPyArray, used if you specify converting toPyArrayorAbstractArrayorAny.T=Array, S=DenseArray: used if you specify converting toArrayorDenseArray, butAbstractArraygets you aPyArray.T=Set, S=AbstractSet: used if you specify converting toSetorAbstractSet.T=Tuple, S=Tuple: used if you specify converting toTuple.
Some rules for t=None:
T=Nothing, S=Any: canonicalT=Missing, S=Missing: specifyMissing(orUnion{Missing, Foo})
Some rules for t=float:
T=Float64, S=Any: canonicalT=Float32, S=Float32: specify another float typeT=Number, S=Number: specify another non-float number type such asIntegerT=Missing, S=Missing(for NaN)T=Nothing, S=Nothing(for NaN)
Some examples for converting a float:
- to
Any: only rule 1 applies (filtering onS) - to
Float32: rules 2 and 3 apply (rule 1 ignored due toT, others due toS) so rule 2 is tried first. - to
Integer: only rule 3 applies (filtering onS). - to
Union{Integer, Missing}: rules 3 and 4 apply (filtering onS) so rule 3 is tried first.
Say some package defines myfloat <: float and adds a rule for it:
T=BigFloat, S=Any
Examples converting a myfloat:
- to
Any: rule 1 and the new rule apply. New rule more specific int, so use new rule. - to
AbstractFloatorNumber: pretty much the same. - to
BigFloat: only the new rule applies. - to
Float32: new rule doesn't apply, so as above rule 2 is first.
Pros and cons
Pros:
- Strict ownership of rules - avoids piracy.
- Return type of
pyconvertmore predictable. - Clearer semantics/rule ordering than currently.
- The number of applicable rules is massively cut down by filtering on
S(usually to 1). - Where there are more than 1, ordering by
tshould then be mostly unique. Using insertion order is mainly to disambiguate unions, plus special rules like for buffers and jlwrap. - Easy to "opt in" to a conversion rule by being more specific about what you are converting to (see the
MyArrayexample above).
Cons:
- People might still pirate (i.e. make rules with
S=Anyfor which they don't ownt). pyconvert(Union{AbstractArray,MyArray}, x)does not do what you might expect (use the genericAbstractArrayrules plus the specialMyArrayrule) because the union gets normalised down toAbstractArrayfirst, so theMyArrayrule is never considered. You need to take more specific unions likeUnion{PyArray,Array,MyArray}which is annoying. We could make a helper function to create such a union for you.
Rejected ideas
-
I considered ordering also based on specificity of
T(more specific wins) andS(less specific, i.e. more canonical, wins).If we use
Sthen in thefloatexample we prefer the genericNumberrule over the specificFloat32rule. But if we useTthen ajuliacall.DictValuehas rulest=juliacall.AnyValue, T=Any, S=Anyandt=Mapping, T=Dict, S=Anyand the latter rule will be preferred, which isn't what we want.The current proposal only alters ordering a little - namely by removing priority and ignoring MRO - and relying on more aggressive filtering from
S. -
We could allow
add_ruleto not always add at the end of the list. It could specify one or more existing rules that it must appear above. Rejected because as explained in the discussion section, the existing proposal is sufficient for any sensible rule definitions.