-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automatic type stub (pyi) generation for java classes #714
Comments
Interesting. I look forward to some additional information. Can this be connected to the existing annotation API that we are using? We currently support Jedi auto-completion which allows linting of returns types if the overload is not mixed. I haven't tried with Kite. Here is my recommendation as an implementation. Just like Java docs we can pick up Java information though the resource API. Therefore, if you run a preprocessor that converts the Java jar file ( |
Hi, Thanks a lot the quick response! Indeed, it goes into the same direction as the annotation API (and Also, this supports overloads, even when the signatures are completely different and return types depend on the signatures, through the overload decorator: https://docs.python.org/3/library/typing.html#typing.overload - also it should be possible to represent generics and type arguments, although my first version does not handle that. I frankly don't know if the annotation API supports all of these. As for distribution, I'm afraid it can't be (just) a JAR file, as the whole point behind this is that python IDEs (and possible linting tools) can read the type stubs in a purely static way, without having to run any of our code. Therefore, the type stub "shadow packages" have to be packaged and installed in a way common python IDEs support, typically as an egg/wheel. The structure has to match the Java class hierarchy, for instance (after generating "java.*" and some private APIs): |
Okay that makes sense. Please keep me posted on any hooks that you require in terms of support. https://www.jetbrains.com/help/pycharm/stubs.html I guess the correct solution would be to make a jpype-java-v8, jpype-java-v11, etc. package that gets posted to pypi then? We can then have those packages depend on JPype1 to reduce the need to install both separately. Is that what you envision? This is very similar to what I am working on for the reverse bridge in which Python generates a package with nothing more than interfaces for each Python class. That allows Java to have an interface with each required method exposed in Python. |
Sure, thanks a lot!
That would indeed be an option to ease the working with java standard library classes (like https://github.com/python/typeshed does for python standard libraries). One potential (minor) issue with static stub generation is that |
The default for JPype 0.8 on will be conversion is false. The on mode is a bit of a misfeature as there are a few functions in Java where chaining a string is needed. (It is non obvious if you call the Java string constructor and get back a Python string). Though we may be able to splice some aliases into the We should definitely ship the stub generator as part of JPype so that someone can run it as needed in the field. |
I reviewed the contract for string and Java string. There is one strong conflict, two weak conflicts and 2 near conflicts. The strong conflict is Java and Python format. They are similar in design but one is a static method and the other operates on a string. Split is very similar but Java uses regexp rather than simple text on the split, Replace is almost the same but Python adds an extra parameter. endswith and endsWith are near miss. startswith and startsWith are near miss. I could submit a PR with Java String completing almost the same contract as Python. The return type will always be Java strings for consistency. Give then we are switching to no longer convert strings, this would in principle save a few people, but as overloading strings completely is not possible (though maybe I should review this as this was last reviewed in Python 2 age. The other thing that will not pass is of course isinstance. Hard to dodge that one without deriving the type. @marscher would this be worth the effort to put in as a PR? |
Thanks a lot for checking. I agree that conceptually this feature is not very nice, although in many common use cases it saves a lot of explicit type conversion clutter. Checking how this is done in PySide (Qt), they do the same for QString ... For the return types, yes, we may get away with making java.lang.String "look like" python str, and I agree it could be a nice feature, although as you noticed it will not cover all use cases. On the other hand, what is the plan for str to java.lang.String conversion? This one seems "harmless" to me. With convertStrings=False, will you have to pass explicitly constructed java.lang.String objects to Java methods which take strings as arguments? Anyway, I'm going to make that the stub generator honors the flag at generation time, so it is up to the user to decide. "Official" stubs would be generated assuming convertStrings=False, but for those using the feature, they can build their own stub tree with convertStrings=True. In the end it's a simple type replacement at the level of the stubs. Edit: is there any foreseen way to detect from Python if JPype was started with convertStrings=True? Currently I'm using this, but it is somewhat artificial... def convert_strings() -> bool:
if convert_strings.jpype_flag is None:
from java.lang import String # noqa
convert_strings.jpype_flag = isinstance(String().trim(), str)
return convert_strings.jpype_flag
convert_strings.jpype_flag = None |
Convert strings only affects the return path. We always implicitly convert string as arguments. The argument path for stubs likely needs some extra hooks. We have quite a few automatic forward conversions like string, path, date, and list. I dont think there is currently a way to get a list of implicit conversions. Some of these are hard types, and others are duck types. What level of detail do you need for stubs? Do you need a method to extract the conversion rule by Java class? |
Yes, then indeed I would need a way to extract a list of the types accepted by the implicit conversion for a particular java class in a method argument. Another question that goes in the same direction: is there a hook to get the "mangling" jpype does to identifiers when they match a python keywords (adding '_')? |
The implicit rules currently recognize three types of conversions. The first is exact in which a conversion is only applied if it exactly matches, but this one is only there to trigger fast logic. The others are in the form of a list of types that will be taken or a list of attributes that will be probed. I suppose the best way to do this is to get you an API call that returns those two lists as a tuple so that we can construct Protocol types for each. Name mangling is handled by the |
Okay looking it over there is one special exception in the type system. Python strings are Sequences but JPype flatly refuses to recognize them. There is almost no case in which the user wants to pass a string to a list object and make it break it into chars. |
Ok, I can't see how this special exception could be easily represented in the type annotation system, as if we e.g. allow |
"Never let perfect be the enemy of good." There will also be an edge case or two. But nothing here gives me any pause. I will work on the probe API. Hopefully completed by the end of the weekend. |
Perfect, thanks a lot, that would allow me to get rid of this hardcoded mess: https://gist.github.com/michi42/2110a8615a0e917a13ec6748c6168735#file-stubgenj-py-L212 ;-) For objects returned by java to python, are there any implicit conversions (apart from String -> str if convertStrings = True)? |
There are a few implicit conversions
|
Progress reportI did a some work until the wee hours of the morning on this. I think that I settled on a solution. Each converter in JPype will have a
There may be repeats on the list (order is based on the conversions applied to a type). (For example it may try PyLong_CheckExact followed by PyLong_Check which would put two copies of long into the list). This will be added to by any additional type information (flags, array component, etc). I may end up merging this the current Will this be enough for the stubbing system? |
Thanks a lot, looks good to me. Just for my understanding, how is Another thingy - How can one collect a list of all active converters in JPype? As a heads-up from my side, I've implemented a first version to support Java Generics, and transform them into their python TypeVar counterparts. This is still not perfect and it has its limitations, some of which I still hope to overcome (as far as the python type hint system allows). e.g. for _Collection__E = _py_TypeVar('_Collection__E') # <E>
class Collection(java.lang.Iterable[_Collection__E], _py_Generic[_Collection__E], _py_Collection[_Collection__E]):
def add(self, e: _Collection__E) -> bool: ...
def addAll(self, collection: 'Collection'[_Collection__E]) -> bool: ...
def clear(self) -> None: ...
def contains(self, object: _py_Any) -> bool: ...
def containsAll(self, collection: 'Collection'[_py_Any]) -> bool: ...
def equals(self, object: _py_Any) -> bool: ...
def hashCode(self) -> int: ...
def isEmpty(self) -> bool: ...
def iterator(self) -> 'Iterator'[_Collection__E]: ...
# .. |
Attributes are all methods taking just self. ( Suppose that you ask for the types that work with a java.lang.Number. Calling info would give
The attributes are used to pick up numpy number types (np.int32, np.int16, np.float64, ,etc) The intent is to make a _java_lang_Number 'protocol' that can be used to make:
I am not sure how to make the to number protocol that knows to take any thing with that specification (which is why I never got far on this path). But assuming you can make it happen we can have very good stubs. The lists of user defined (like date, pathlib.Path, sql types) are available in the |
If you can tell me how to turn the above into a protocol then perhaps I can put a As far as where to go lets get it working first so we have a clear picture off all the requirements, then convert it to a PR as |
Let me take a guess at the example:
Is this anywhere close? |
Yes, this is exactly how you would define the Protocol classes. Just note that for some common cases, there are actually pre-defined Protocols in For turning stubgenj into a runnable module, sure I will do it, however it will only work for "basic" cases where you do not establish a classpath programatically, add custom import domains or alike. For more complex cases, it's probably the most user friendly to still expose a public API that allows the stub generation to be triggered from a python script, after setting up jpype and starting the JVM with the necessary options. |
I just found a funny "feature" in the forward conversion of arguments ... public static SomeRequest SomeRequestBuilder.byNames(java.util.Collection<String>) If I invoke it like the following, I get a "no matching overloads found" message: from cern.bla.factory import SomeRequestBuilder
print(SomeRequestBuilder.byNames(["a", "b"])) however, if I add from java.util import ArrayList before, the very same statement suddenly starts working (but the IDE thinks this import is un-used). Shall I open another issue for this? |
It isn't supposed to be on demand loading. I think it warrants an issue if you can replicate it. I am guessing there is a bug in the customizer that is failing to load ArrayList so that conversion can be completed. |
Hmm maybe I don't need to support attribute style conversions if I can cast them all into Protocol checks. I need to ponder that. |
The new JPackage may make it a bit easier. You can now walk the whole package tree.
Drat.... I see an exception. Well another thing to go fix. |
FYI, I just updated https://gist.github.com/michi42/2110a8615a0e917a13ec6748c6168735 with a basic |
I will probably drop the snake case. By the Python principle there can only be one, and as 95% of the library is interacting with CamelCase (due to Java), we have preferred camel case throughout. So is there anything else that you require at this point? |
Well I have overloaded the classes to accept '[:]' and '[#]'. It would be a small matter to make them accept |
Actually the question is where should the dynamic stub be generated from. Is it better to make it |
Sorry, I think I don't fully understand your proposal. _List__E = typing.TypeVar('_List__E') # <E>
class List(Collection[_List__E], typing.Generic[_List__E], typing.List[_List__E]):
@typing.overload
def add(self, e: _List__E) -> bool: ... Note the extra supertype If I look at _List__E = typing.TypeVar('_List__E') # <E>
class List(Collection[_List__E], typing.Generic[_List__E], jpype._jcollection._JList[_List__E]):
@typing.overload
def add(self, e: _List__E) -> bool: ... since The easiest way to allow this would be to declare _JList (and other customizers) as following in _jcollection: E = typing.TypeVar('E')
class _JList(typing.List[E]): See https://docs.python.org/3/library/typing.html#user-defined-generic-types for more examples. This will not have any impact on the run-time behavior, but it allows for static type checking (by formalizing that this object is implementing the |
I am still confused by what you are trying to achieve with What we can do is either add new stubs in |
Sorry, let me explain again. When generating the stubs for collections (and other types that have customizers), I somehow need to take into account the fact that a customizer exists - and what it does. def extraSuperTypes(className: str, classTypeVars: List[TypeVarStr]) -> List[str]:
if className == 'java.util.Map':
return ['typing.Mapping[%s, %s]' % (classTypeVars[0].pythonName, classTypeVars[1].pythonName)]
elif className == 'java.util.Collection':
return ['typing.Collection[%s]' % classTypeVars[0].pythonName]
elif className == 'java.util.Set':
return ['typing.Set[%s]' % classTypeVars[0].pythonName]
elif className == 'java.util.List':
return ['typing.List[%s]' % classTypeVars[0].pythonName]
return [] For List this leads to (the bold thing is added by this functionality):
But this is obviously not scalable (it will never work for user-defined customizers), and it duplicates knowledge of JPype into the stub generator, where it does not belong. Now my idea was to use In this case, could we foresee a way to define (and retrieve through JClass._hints) a Protocol which the customizer implements? For instance, the For Java classes, there is nothing to do to make them generic - this is purely a thing at static type-checking time, the runtime behavior is fine. For Java classes, the generated stubs define them as generic (using typing.Generic) in python if they are generic in Java. However, when it comes to python customizers on top of Java classes, things are not so straightforward anymore ... |
I would say the correct hook mechanism would be to defined the generics that you want in protocol, and then add a hook into the JClass structure so that when the jcustomizer is loaded it can register the protocol in its outgoing implicit list. So lets walk through the changes. The hint structure composes a list by walking through each of the type conversions calling get info. As this is just a fake piece of info, there is no reason that it needs to be added to the actually conversion list. Instead it should simply extend the list with the user supplied type info. So I would guess the modifications would be
This is just a rough sketch of how I would do it. There may be other details like how to add the generic type parameters or how many needed to be added that would need to be worked out. Does that make sense? You can use a similar mechanism to add all types of meta information to the hints class. As a separate mater we should have the list type converter get the generic type arguments so that it properly forces each of the elments of a List into a String rather than the current object. |
Hi, Sounds good, but I'm not sure if I understand the "outgoing implicit list" part correctly. To make it clear - I'm not talking about method arguments (there the So in general, we need a way to formally specify what a particular customizer does - e.g. via a supertype or protocol in @_jcustomizer.JImplementationFor('java.util.List', implements=jpype.protocol.List) # or typing.List
class _JList(object):
""" Customizer for ``java.util.List``
This customizer adds the Python list operator to function on classes
that implement the Java List interface.
"""
# ... This should then end up in a separate attribute in the hints structure, e.g. In the cases of collections, this will usually be a generic protocol using one or more type variables (e.g. List, Mapping, Set, ...). If the supertype/protocol is generic, and the java class is generic, the stub generator will make sure that type arguments of the java class are forwarded, so if _List__E = typing.TypeVar('_List__E') # <E>
class List(Collection[_List__E], typing.Generic[_List__E], jpype.protocol.List[_List__E]):
# ... Note that all collection types imported from the However, even though the immediate problem was that the customizer classes for collections were not generic, I would not call the attribute I see some other (non generic) classes also have customizers, e.g. |
Adding the protocol to the customizer as a keyword seems like a reasonable solution and you can place that info into the hints pretty easily from there in _jcustomizer. As you are the primary consumer of this particular API, I would recommend you take the first shot at the implementation. It should be a pretty easy modification as all the expected entry points are located. I am not sure what you mean about non-generic classes with customizers. Shouldn't the sub generator just grab the extra methods into the stubs and forget entirely about the customizer? Or is there some issue that I am missing. The point of the customizers is to give extra methods so that those methods appear to be native to the Java class. Calling out the customizers in any way seems like a bad idea. Though those that are marked "base type" likely still have to have a presence. The change that is removing the customizers from users view is now in the incoming PR (#828) though I have not moved them yet. To make it so that we can scale the customizer system, the customizers are going to be located in the Java jar itself. So the _JList customizer will be located in "java.util.init.py" which will be loaded from the org.jpype. That way if someone provides 3rd party customizers they just have to insert them in the jar file or a companion jar file. (At last, I can make my gov.llnl.math library a real boy!) I will likely make it support both pyc and py files (if I want it to be Python version independent). You could technically refer to the revised ones by the "java.util._JList" but only if the import system is installed. My long term goal here being to hollow out jpype to just be decorators, basic types, and start routine and have everything else live under the Java space. (I would really like to make imports a required package at some point, but that is another issue.) |
That's a very good question - the reason is that I am relying on Java reflection (e.g. As shown above, my initial approach to this was to treat customizers as an extra superclass of the customized class. This works fine as long as methods are only added but not overridden. However, to get code completion and type checking right, this needs the customizer code or some protocol/stub describing the added methods of a customizer to be available to the IDE. Now understanding a bit better what the plan is - thanks a lot for the explanations! - I see two options that I think would work well with the planned packing of customizers into JARs:
Both solutions should work equally for generic and non-generic customizers. I think I would prefer solution 1 for simplicity and separation of concerns (stubgenj should not need to generate stubs for python code). So once the #828 is merged, as time allows, I will take a first shot at this. |
Alright this is much more clear. So my recommendation is that after the sub generator goes through the Java class it asks the _hints for the list of class customizers. Basically start with option 2. You can pretty much discard anything that overloads a Java method. Those won't have any type information and are generally being phased out anyway in favor of the converters. They were only used to insert some code in the process with a few minor exceptions. As the functionality should be the same in most cases if won't cause an issue. There are some places where the name was conflicting that perhaps we should flag. The JOverride accepts keyword arguments so we can likely place a tag there. After that you just need to copy over stubs for the extra methods so the stub generator for python will be pretty limited. We can either add stub information to Python or manually tweak the info on the method using a decorator. We can then work to allow option 1. I can also pretty easily put the stub files in the jar as well as you recommend and place the stub file in the hints structure as well. We can get to the stubs already using Java methods or add a special hook into Perhaps I should just go hog wild and start adding stuff onto the totally useless Java class "Package" and move the hints stuff to "Class" so this sort of stuff is available in the public API rather then hidden in the depths. But lets finish this first round and then start promoting to a public API. The reason for me to finally start into the scaling problem with the jar file solution (other than I am stuck inside all the time and can't go to the gym thanks to COVID 19), it that the android port hit a snag because they somewhat foolishly named their Python packages the same as the Java packages so they have a real mess to deal with in terms of compatibility. Making it so that Java packages are Python modules and you can insert arbitrary Python code into the Java package as |
I gave it a try (Gist updated). After spending some brain cycles on generating stubs in stubgenj for python customizer classes by inspection, I gave up - this is not straightforward, in particular under the presence of overloads or type parameters, and then it is not quite the scope of stubgenj. So what I do for now:
In any case, I write the stubs for all customizer modules of a particular java package into a " For this to work fully with the existing customizers for collections, we need to add type annotations or stubs for them. I will submit a PR on this soon. |
Sounds like a plan. Thanks for all your hard work on this. I solicited some users on features some of which may need some stub support. Some of the features they were interested in were name mangling to Python names, removing methods from the API, and renaming or replacing methods. It seems like for your use I should go about finishing my long planned upgrade to _JMethod by renaming it to _JMethodDispatch and then exposing an actual _JMethod front for individually overloads. Thus, rather than having to scan the whole java.lang.Class you can instead ask an individual dispatch what methods are under it and their typing information. That way if a dispatch were to be renamed, deleted, or have its return types altered that information would be available to the stub generator directly rather than going all the way back to the Java class definition and trying to work forward.. I also added the initial support for generics in #835. It allows things like 'java.util.List[String]` to exist. The complexity being I really don't get some things about Java reflection of generics. Which is how to I tell if a generic parameter is bound to the Class or to the method.
In the A method the argument should be bound to the class in the second it is free. I check the assembly and the information is clearly being stored but I can't see how I can discern that from the reflection API.
|
Indeed, not being able to enumerate all overloads from JClass was one of the reasons why I used the Java reflection API. It is not the only reason though - not all typing information is available currently from Of course we could aim at putting some or all of the functionality stubgenj currently does to map these to the python typing system directly into JPype. However, this is probably a good piece of work and may not be straightforward in all cases, in particular if old python versions need to be supported. The big advantage I see in doing this mapping independently from Java Reflection is that it does not affect runtime behavior of JPype, so it can be a bit more sloppy in some edge cases. For the generics, the types should inherit from |
A little update - together with @pelson we've further improved the stubgenj and turned it into a standalone installable package (for the time being): https://gitlab.cern.ch/scripting-tools/stubgenj I'm also working on a little test suite for this. |
@michi42 How can I open a PR to stubgenj? I have a patch that add class Javadoc via |
@Christopher-Chianelli Good question - I did not consider this when I created the repository on the CERN GitLab. Indeed write access is restricted to CERN account holders. To get this out as quickly as possible, could you just send me a patch for now? Also, I just pushed the latest version to pypi for your convenience: https://pypi.org/project/stubgenj/0.2.2/ |
@michi42 patch is attached |
@Christopher-Chianelli Thanks a lot for the patch. It looks good to me in general, but I think the specification do not allow adding Javadoc for empty class stubs (at least pycharm reports it as error) - therefore I have removed it. I have released https://pypi.org/project/stubgenj/0.2.3/ with your changes. |
@michi42 I managed to find a way get method javadoc in the majority of cases. Sometimes method javadoc is missing, which appears to be the case when JPype cannot find the Javadoc for a class (for me, it happens when an interface extends another interface). (Basically, when |
@Christopher-Chianelli Thanks for the patch, I tried to tune it a bit - basically in the end I call into Also I had to change your RegExps a little as it was matching too broadly, which lead to the wrong JavaDoc being attached to overloads. Here is the MR, I will let my colleagues have a look ... |
@Christopher-Chianelli I have just released the updated version as 0.2.4: https://pypi.org/project/stubgenj/0.2.4/ Thanks again for the patches! |
@michi42 FYI JPype1 1.2.1(which stubgenj depends on (and since |
Done: https://pypi.org/project/stubgenj/0.2.5/ - now asking for This update also includes a fix for a misbehavior that directories in Javadoc JARs on the classpath were seen as "java packages", even if they contained no classes or subpackages in the real class JARs. |
News? Can't this be merged into JPype directly? |
@michi42 I have been using stubgenj and it has worked great so far. Thank you for providing it! The only issue I've encountered is that it isn't generating stubs for abstract classes such as this I attempted to post an issue on gitlab but am unable to do so from a guest account. I apologize if this is the wrong avenue. |
@MrFizzyBubbs apologies for the delay, I was quite busy with other work. Just had a look, it appears the problem comes from the fact that JPype's Looking a bit further, it appears the |
I'm working on making a rather big, domain specific java API available through JPype. One of our main issues is the complete lack of type information during static analysis - and therefore no linting or IDE auto-completion.
PEP 484 and 561 support providing type information in separate .pyi "stub" files. The mypy project provides stubgen, a tool to automatically create these from python or C modules.
Since Java is strongly typed, all the information necessary to generate these kind of stub files is in principle available in the class files and can be obtained e.g. through reflection.
I implemented a first version: https://gist.github.com/michi42/2110a8615a0e917a13ec6748c6168735
EDIT: Now a standalone repo: https://gitlab.cern.ch/scripting-tools/stubgenj
The module is supposed to be called from a simple script that sets up the JVM, e.g.
This is still very limited, just a starting point that handles the "most common" cases. In particular, it does not support Generics (should be translated to python type vars), Constructors (should be translated to init methods) yet, and the types of constants and enum constants are not yet detected.
The text was updated successfully, but these errors were encountered: