-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Java class extension from within Python #420
Comments
This still has several pieces missing. We need to remove the bootstrap loader and add a altered copy of asm library before we can make this work. Therefore, I am pushing this to 0.9. |
Hi, This feature is quite neccessary in some use cases. One way to do it is like it is done in JCC, to have specific java classes that are used for the classes to be subclassed and then take care of the selection of which python routine to call wihin java. It is far from ideal but could maybe be used as an intermediate stage. An example of how this can be used is one of the wrapping classes of orekit: It isn't possible to do something like this today in JPype I understand? Regards |
This item is on the roadmap to 0.9. I have a basic prototype which uses the Java asm library to rewrite a Java class into an interface with an extension hook for Python. I am currently juggling a number of development items and have 3 items ahead of this on the schedule. I can try to bump it up in the schedule if there is a strong interest, though I doubt I can push it in front of the 0.8 release which is item 1. The task list for this is
The items with the stars are the long poles on the tent. |
Extending class is an important feature for inheritance. Looking forward to seeing this feature! Do we have any updates? |
I completed a prototype of it a few months back as a proof of concept, but it has a lot of issues that need to be resolved. In particular, the prototype requires that asm take the class and create two new classes. The first class is an extension of the existing class which holds a proxy object. The proxy object is the same with all of the methods converted into interface. So the hard part is how to decide if a method is implemented by the Python class. One solution is to have the proxy invoker check for existence of Python method in the dict and if it isn't there throw an exception. Unfortunately, exception stacks are pretty expensive so that is not a great solution. We can use a flag to verify that it was actually called. This isn't thread safe because if two methods get called the flag may not reflect the actual value of this call. So that leaves the two part invocation model. In this we have one JNI call the fetches the Python method if present and if not calls the base Java method. Otherwise it launches the Python using a second JNI method. We didn't have the infrastructure for proxies to directly load complex JNI methods but I think that is finally resolved. The Python implementation is straight forward. We need the meta class to recognize the attempt to extend a concrete Java class and call the invoker hooks which generate the two classes. We then scan for the JOverride methods which will create a Python proxy which implements the methods and installs the proxy in the object instance. There is also a memory loop issue. The Python instance points to a Java object which has a reference to back to the Python object. Thus for this to function at least one of these needs to be a weak reference type or ever instance will live forever. The last issue is properly cloning the methods that need to be overriden. Simple argument lists can be done with the ASM visitor pattern, but in some cases we also need to copy the exception list. I think I have mastered this pattern in my last attempt. But there may be edge cases. Either way the newly created classes need to be loaded into memory to work. This means we have to call a custom class loader. The new dynamic classloader should be extendable for this task pretty simply. The alternative that I did was to make a custom class loader which is tied to the ASM directly. That is a fairly common pattern in which the loader/generator are tied together as one class. I did also run into some security problems as the loader that creates the class is different that one that creates the application classes. In some cases The last wrinkle is how to make this work on Android. The Android "JVM" is not actually Java bytecode but DEX. I can likely get the new code to simply not work on DEX by using a patch to remove the code, but if we did want to work there it would require a completely different solution than ASM. There is also the concept of the security model that we face there. Either way we need to modularize so that the with and without options don't require a massive amount of patching. I have put a fair amount of thought and prototyping effort, but I will likely need to make a hard push to actually get something which is usable. It is not terrible, but thus far I haven't had much in the way of compelling use cases within my local group to motivate me so it keeps getting put on the back burner. |
I took a shot at this one over the weekend. The hardest part is how to properly allow the user to call "super" during the construction stage and potentially for members. In Java it is possible to call the base constructor only at specific times during the initialization process. It seems like I need a special object type when a proxy method gets called which will contain a special I think if possible it should look like this...
Here when the proxy call gets back it would give you a special copy of "self" which has reserved access to the get/set and I still need some more consideration on this topic before I can start implementing. Does this look reasonable? |
I am not familiar with low-level logic at the moment, but your reasoning looks good to me. |
The current logic is any keyword conflicts should add a trailing underscore. The most obvious being class_. Someone is unlikely to need to use super unless they are replicating logic from Java. Thus I don’t think the requiring a special word like java to be added.
However there are several options we can consider.
class MyObject extends MyBase
{
public MyObject(String name)
{
super(name);
…
}
}
Could become something like this…. (Perhaps I can make it a bit closer to Java)
class MyObject(MyBase):
@jpublic(String)
def __init__(this, name):
this.super_(name)
…
I guess the other way would be to make more C++ like and drop the __init__ or perhaps force the method name explicitly rather than super.
class MyObject(MyBase):
@jpublic(String)
def MyObject(this, name):
MyBase(this, name) # or perhaps this.MyBase(name) would also be a candidate as it has a better lookup?
…
This would avoid the need for a super_, but then I need some magic to make MyBase recognize a call with this to direct to the <init> method.
Which of these looks more natural? The key here is to avoid something which is ambiguous with something we already have defined and at the same time looks intuitive from Python and Java prospective.
The issue with class extension being we can do interfaces pretty easily because they don’t have a base implementation. Currently, we hide all the <init>, private fields, and private methods. But when you extend a class you are supposed to have access to those things. This means I would need the table lookups to have privates available but hidden except when they are needed. This is going to mess up the Python dictionary system pretty badly unless I do some pretty radical things behind the scenes.
This is largely the reason that I haven’t bitten this one off. There are a lot of edge cases and a lot of machinery that is needed.
I have completed part of it by taking the JClassBase and having it redirect the __new__ method back to a hook in Python were I can start to process of building the class. But a lot of questions regarding the syntax remain before I can really start cutting code. How do we define the argument types (decorator or Python style type declarations), how do we get to private methods/fields, how to we define the constructors and get to special methods like <init>? Once this is defined the implementation is pretty easy. The call redirects to Python which reads through all the method annotations and creates a class description record which calls a Java method which builds the class from ASM. The ASM class is loaded and then we call JClass() on the resulting class so that we produce new Java wrapper class which has Python methods embedded in it.
|
So lets start with the basics. To support Java, we need to the Python class dictionary to do some things that Python doesn't allow. Python does not allow a method definition to get ahold of the class that it is being defined in, nor allow for overloading, nor does it support the concept of name mangling. So we are going to have to do some evil hacking first. The usual way to do this is to add a decorator that will break into the process. The order of operations for creating a class is
Unfortunately, in the CPython implementation the name is taken through a different path than the actual Instead we have to get ahold of the outer scope which for the decorator would be the class scope. Python has three methods to get ahold of the scope Next we can then use inspect to look at the function and make sure it meets any other requirements that need to be set. This makes it so we can enforce those extra Java requirements like all methods must take Java class arguments only. The resulting code looks something like this. (Please note, I disavow any and all knowledge of Python. Any resemblance to actual Python is purely coincidental.)
|
Thank you for your detailed explanation! I have a grip on your planned approach after reading it. Java
Python
|
@enjoybeta with regard to dynamic loading we should likely take this to another thread. |
It appears that Jython uses the So my best guess of syntax would be something like the following.
@marscher Any thoughts? Unfortunately this is not very compatible with Jython as far as I can tell. We need a lot more annotation information as we need to support stuff from Java such as overloading. They pretty much skipped the overloading and dropped Java keyword support for stuff like super. The Pythonic method of declaring variable slots is really not usable. I tried Pythons |
@marscher I need an executive decision on something. In order to do extensions I will need to use the asm library. If I import the library then it will preclude a second copy of asm such as the one from Kotlin from being included at the same time which may mess things up unless both are using the same version. The second approach would be to use JarJar or similar to rename all the symbols in the library to something else and then include it in the JPype jar. Other than the special magic to run JarJar and do the jar inclusion pattern this one is doable. Third would be to copy their source into JPype source tree and compile it in with a new package name. This added headaches to maintenance. As we have Scala and Kotlin users it seems like we should plan to avoid conflicts. The license for asm seems pretty liberal (BSD) and would allow direct linking. Their documentation seems to indicate that is an acceptable solution so long as their copyright statement is included. https://asm.ow2.io/license.html Which of these options seems acceptable?
|
If we can include the source tree of asm as a subtree, that would be preferable as we can easily update it from upstream. How stable is asm? Do you expect lots of future updates? If not we could also do the static approach and rename using jarjar. I would just want to exclude option one for now. |
JarJar was a no go. It is so out of date that it gets hung up on many modern jars as the ASM version it includes is ancient. So I am going with include source for now. ASM has changed a bit in recent years to keep up with the lastest JVM changes, but we are going to be making "old" (and I mean really old... like JDK 1.5) asm for our hooks so we likely have no need to track the ASM latest. We could even strip out a lot of it as I will be using only about 20 op codes for the majority of the work and I can skip a lot fancy stuff. The plan is simple, create a set of native hooks that we will use to transfer control back to C, write a prototype class which exercises those hooks, decompile the prototype with javap, the write a class visitor which takes and existing class and replaces the methods with the redirection hooks. (Okay maybe that didn't sound as simple as it should, but don't worry. I got this.) I may have to bifurcate some things in the native directory level. Android uses a different machine so I will need to compile in ASMDEX rather than ASM when building hooks for that platform. But first things first, lets get the basic extension module complete. |
Heya, it's been two years now (roughly), has there been any progress or new decisions regarding this? |
Status remains largely unchanged. I completed a prototype two years ago for a reverse bridge allowing Java to call Python and using ASM to convert directions into Java classes which is the first step to full integration such as extensions written in Python, but in order for it to enter productions it would required additional programmer help as we would need to test all of the different aspects of the reverse bridge capabilities. And at the same time my employer decided that they would not sign the Python community contribution agreement. As I am only allowed to work on projects that do not require a signed agreement this left me at an impasse on how to proceed as a number of actions required being able to contribute the hooks to better support language integration. If I can get together some interested users that are willing to help write the tests then we can complete JPype 2.0 which would include this feature, but as it stands my employer prohibitions leave with very little motivation proceed by myself which could potentially jeopardize my employment.. The cost of working in a bureaucracy is often silly and ill informed decisions have unintended and harmful consequences. |
The main technical issue is the ability to call private functions from within Python defined extensions. JPype currently can only call public methods and unsafe access have been largely eliminated by the module system post 9. So while I can allow someone to write a trivial extension class it would only be for SAM or interfaces for which there are not private members to access. Sill useful but not much use over what we have now. You can always write a short piece of Java and include it in a package as a jar aor class file so just allowing extensions for limited use is not necessarily a big advancement. Of course if we have the full reverse bridge that would be an advancement as we would no longer be converting container by instead passing them to Java. But this change will certainly break some code as all code currently assumes that Java classes receive native Java types rather than Python wrappers. There is a also the memory issue as once you pass a Python wrapper that is held in Java, then you have memory cycles. Neither Python nor Java were designed to be able to handle external memory management and they lack a protocol to communicate with another memory management system. I left that as a "hard problem" as unless I rewrote portions of Java or Python it would be difficult to resolve. I considered trying to exploit the Java RMI which does have edge memory management but it did not have hooks for local use. |
I took a shot at extending java objects using Python. In order to do so, we would need to construct two classes from within Python. One is an extension Class which overrides each of the targeted methods and points them to a interface holding the methods to implement. The annotations for the task are pretty easy and the rest of the mechanics are well supported by current JPype. We can even call super class methods because the virtues of JNI. The only limitation appears to be difficulties with overriding the constructor. Thus the question is how do we get the two required pieces compiled on demand.
The need for this development is pretty compelling. There are a lot of Java classes that have abstract methods which must be overridden to implement. Though few new APIs use this style, there are plenty of older ones which still require it. Thus long term it is a requirement for JPype to be considered a complete solution.
I took the first shot at this using the MemoryCompiler. The compiling from within Python is a fairly tall order, but is possible. But I found a lot of downsides. The java compiler is huge, incredibly slow, and spawns into native C code. Further, it is only available when running the jdk copy. Thus I am looking for other solutions.
I have evaluated java assembler solutions such as Jasmin and Lilac. Lilac is certainly the most advanced and capable with exterior goals that cover our needs. And the author seems to have done homework on what was needed for assembler/disassembler capable of "a perfect round trip." Thus I started by studying their inner workings. It is certainly plausible to get what we want from them. However, thus far they are all far from ideal. The assemblers are general purpose and suffer from the typical maven philosophy of pull whatever you need. The result is for one or two functions you are holding on to 4 dependent jars which each must be gotten to work. For a large thing like an assembler this leads to 20 or more dependencies. It is not that maven doesn't make this possible, but as a programmer from the 80s, I don't like to depend on that much 3rd party stuff.
The second problem I have had with each of them is that the assemblers all have poor separation between the data pieces and the process. The data classes in the assemblers do the job of packing (and unpacking in the case of lilac) and rule checking the code. This is way too heavy weight for our purpose. For the purposes of our usage, we just need a java class that gets a class and a list of method references and constructs our two classes within the data model directly then calls the assembler to pack the data into a class file in the class loader. I don't really need to go through creating a big text file, just to have a lexer/parser chop it back into a tree, and then decode it back to a file when constructing the tree directly is trivial and I want the class to go straight to the loader. Therefore, I wanted to strip them down to just few K worth of functions that can carry that task out. But because all of the functionality built into the current assembler classes, it is like a big Jenga tower. I can't pull out the pieces I need because there is too much stuff above it.
Studying the assemblers shows it isn't that hard after all the boiler plate is complete. There is a bunch of table encoders needed to put the file back together, and some decoders to test the process is working. Java has a few support classes to make it possible to do some of the work without 3rd party libraries (DataInput, DataOutput, JarFile, etc). But because the Java compiler is not actually written in Java, it is just pieces.
Since Lilac can't be cut to pieces and I haven't got a response from the author, it seems like I am going to craft the pieces I need. I can still test the back end encoding using a converter from the existing assembler to my mini one, so I can likely cut the work in half over a full tool suite. Studying jas, jasmin, lilac, and the java docs makes the required path pretty clear especially when I can peek over into well tread ground to see how others interpreted the spec even if I can't copy from it directly. After about 4 hours of effort, I pulled together most of the class file data structure. But I anticipate that at least 30 hours will be needed before it is complete, which puts it pretty far down the road before I have something to submit considering other priorities.
If anyone is compelled to assist in the effort I can post the code for the mini-jasm to github.
The text was updated successfully, but these errors were encountered: