Feature idea: Extraction of docstrings from javadoc #702

petrushy · 2020-04-24T09:08:53Z

Hi,

This is likely a far in the future enhancement, but just to write it down.

It would be interesting to have possibility of docstring generation from javadoc. So that for automatic popup info the documentation string is available, with more details as of now.

One needs then of course to have access to the source code. And maybe it could be parsed to some database.

One tool that may be useful is qdox, a java tool that can parse source for javadoc.
https://github.com/paul-hammant/qdox

petrushy · 2020-04-24T15:42:59Z

I was trying some things and it is not possible to monkeypatch the doc property of JObject in the same way as the repr, is it?

Thrameos · 2020-04-24T15:48:43Z

The answer is no and yes. Doc strings are supposed to be fixed immutable strings so you can't patch them directly. But if you look over _jclass you will find the redirect that converts them into properties and redirects them into the method _jclassDoc. You can apply the same procedure to redirect the doc routine to whatever function you need.

Also notable is that if you compiled with -g:source you can get the source location in both _jclassDoc and _jmethodGetDoc which can extract the java doc in the source or let you extract the java doc from the html doc package. I recommend installing your own handler rather than changing the ones in the code as private names may chance.

Thrameos · 2020-04-25T14:33:56Z

I think two possible solutions here.

First we can look for the javadoc jar resources. It will return the same rather old and crusty html page that javadoc page. If you look know the name mangling you can jump down into the method or class section. Then you would just have to html to rst the blob of html. Obviously we can't get every little detail of html right, but it would get a lot of documentation included.

Second if we can't get the preformed we would call for the source class. Parsing Java is much harder especially if the line number for the method were not compiled in.

In both cases the user just has to add the source or javadoc jar to the classpath. We then use Class.getResource() to fetch the section needed. If it isn't found we just fall back to the usual autodoc.

I took a shot at the parsing, but concluded that it would be at least 2 nights of work to get the javadoc out which is unfortunately a lower priority that work for the 0.8 release.

petrushy · 2020-04-25T16:18:42Z

Thanks for the update and the intense work with jpype!

I did some tests with qdox and attaching it at the variables above. Qdox is parsing directly the source tree to find javadoc (and other parts). However, not sure that is the right way, some javadoc are using references and tags (like inheritDoc), which then is not processed, so the look is not optimal, but nice to be able to plug in things like this in the library. I think your first option is likley the best, using the html extract.

Thrameos · 2020-05-20T04:19:12Z

@petrushy Progress update. I succeeded in integrating an HTML parser that can extract each of the html sections from the javadoc files and a Zip file system that allows the user to open the base Java API documentation. There are two parts remaining to this task.

Convert the html to rst. (I tried a few of packages that are supposed to perform this task but the javadocs are have a style sheet that make it hard to convert with anything generic, so we are going to need to make a custom one.) This one is not so hard. Just simple pattern matches should be able to do a lot of the task. There will be edge case like subscripts and other weird html, but we should be able to get a 90% solution pretty quickly.
Integrate the resulting doc into the class and method files. I am not sure how to present fields and inner classes.

Estimated remaining time on this task less than a week. I should be able to get it into the JPype 0.8 release assuming no major hangups.

Thrameos · 2020-05-24T02:23:41Z

@petrushy Progress update. I have now successfully rendered the entire jdk 8 java doc into rst. It isn't perfect but it is a start. I have one remaining task to link it up to methods and classes. Once that is complete it should be ready to test. Speed is not so good as my parser is pretty crud.

You may want to contribute by improving the renderer as it could use some additional work. Sometimes the combination of html elements generates invalid rst (like "``````"). References and linkage to external documents don't always work. Tables are not rendered at all.

There are three major support classes.

JavadocExtractor - pulls all the sections out of html document
JavadocTransformer - converts the dom sections into a markup usable by renderer with custom tags. This may be possible to replace with a good xslt, but I am not too good with that tool.
JavadocRenderer - Converts the marked up sections into restructured text.

Thrameos · 2020-05-25T01:41:35Z

@petrushy The requested enhancement is complete. Please test, add a review, and comment so it can be included in JPype 0.8.

petrushy · 2020-05-28T16:15:56Z

Hi @Thrameos! Many thanks, will start testing.

petrushy · 2020-05-28T17:36:35Z

WIP: Hi did some intial tests, will spend more time later. Some things seems to be extracted, but others don't (has a javadoc) property still there. I assume it shold be UTF8 encoding of the javadoc, there are quite some settings in the project I'm wrapping..

and in pom.xml
maven-javadoc-plugin
${orekit.maven-javadoc-plugin.version}

${basedir}/src/main/java/org/orekit/overview.html

--allow-script-in-comments
-header
'${orekit.mathjax.config} ${orekit.mathjax.enable}'
-extdirs
${tools.jar.dir}

CS Group. All rights reserved.]]>

https://docs.oracle.com/javase/8/docs/api/
https://www.hipparchus.org/apidocs/

${orekit.compiler.source}
none

Will investigate and try to generate a cleaner javadoc. But seems like some classes that are not detected are rather plain. WIP.

Thrameos · 2020-05-28T17:49:47Z

Is there a Javadoc jar for the package that I can try pulling docs from? I currently have it set to ignore docs that it is having problems with so that could be causing it to skip. So stuff that is missing…. * Tables (no renderer) * Properties (no place to put them currently) * Math and any fancy markup. (no renderer) * Anything with html errors that I haven’t already handled. I haven’t deal with encoding so there may be issues there.

petrushy · 2020-05-28T20:01:25Z

Yes, thanks, it's the orekit library I'm working with, artifacts at:
https://repo1.maven.org/maven2/org/orekit/orekit/10.1/

For example org.orekit.time.AbsoluteDate is one that does not seem to work.
https://www.orekit.org/static/apidocs/org/orekit/time/AbsoluteDate.html

While
org.orekit.time.TimeScalesFactory works
https://www.orekit.org/static/apidocs/org/orekit/time/TimeScalesFactory.html

Thrameos · 2020-05-28T21:52:38Z

Okay I will investigate this evening. (I may need to add a diagnostics mode that one can trigger to get a translation and rendering report.)

Thrameos · 2020-05-29T02:51:05Z

I looks rendered just fine for me. Can you be more specific about what issue you are seeing?

Here is what I see and the script that generated it.

doc.txt
testDoc3.txt

petrushy · 2020-05-29T09:52:13Z

Wierd. I simplified your script a bit, tried it in python 3.6 & 3.7 (conda versions), but get:

Description

Failed to extract javadoc for class org.orekit.time.AbsoluteDate
Java class 'org.orekit.time.AbsoluteDate'

Extends:
    java.lang.Object

Interfaces:
    org.orekit.time.TimeStamped, org.orekit.time.TimeShiftable,
    java.lang.Comparable, java.io.Serializable

...

I have all orekit and hipparchus jar's (not the javadoc for hipparcus) and orekit javadoc in same dir as script:

import jpype
from jpype.types import *
import jpype.imports
jpype.startJVM(classpath=['./*'])
import org

p = org.orekit.time.AbsoluteDate

print("Description")
print("-----------")
print(p.doc)

Tested with a new environment also in conda.

I am using openjdk 8 from conda, cannot test with a newer at the moment.

petrushy · 2020-05-29T09:53:53Z

source is from the Thrameos/javadoc branch

Thrameos · 2020-05-29T14:57:50Z

Okay I can confirm this one. It appears to work on Linux with all versions of Python and JDK 8-11 but fail on Python-3.5 with JDK 11. I will investigate.

petrushy · 2020-05-29T15:01:49Z

I'm on windows currently, have tried with same results on Python 3.6 & 3.7. Can test later on mac / linux.

Thrameos · 2020-05-29T15:16:28Z

Okay I corrected a few issues that I located in that example. You can use

jde = JClass("org.jpype.javadoc.JavadocExtractor")
jde.failures = True

to get the source of the problem. Some of the hyperlinks appear busted (in different ways on linux and windows) but these are mostly just rendering issues that we can track down later. Overall I think this can be included with some followup to address rendering issues.

You may want to do a full doc extraction run to see what other problems need to be addressed. For now I have to move on to 0.8 bug hunt so I can finally finish the release.

petrushy · 2020-05-29T15:35:40Z

Ok, will experiment with it.

Yes, it is still very usable and looking forward for 0.8 release! Thank you for your efforts in this development!

petrushy · 2020-05-29T20:05:30Z

WIP: Removed comment of not working under linux as it somehow is working now. Could be user error.

Tried with different versions of openjdk under windows (8, 11) and the example above do not work in any of them.

Thrameos · 2020-06-11T01:19:07Z

So any conclusion on how well it is working?

petrushy · 2020-06-11T08:06:29Z

Now it is working in windows as well, for some practical tests really well, now using JDK 8. Many thanks for implementing this, especially useful for end-users of "wrapped" java libraries.

Some minor personal preference are user-settable linelength and possibility to filter away the meta tags, like the : class/meth : .' ' (I would prefer it just removed) That looks really nice in a tool that supports rst rendering of javadocs like spyder, but looks a bit noisy in some other environments like jupyterlab, which is a common one. I may have a try at this later, could be user settable.

Many many thanks for implementing this, and the overall improvement of jpype, lots of work.

BR

Thrameos · 2020-06-11T15:03:58Z

Hmm. Okay I suppose that we can find a way to check what the environment is and select the appropriate render properties. The rendering properties are not that hard to control though I am hesitant to make them public symbols as they are.

Perhaps we should just make them pull the values from System.getProperty. Then you would be able to just call the property with the desired value and leave the implementation free to change if needed in the future rather than having people poke at private symbols.

Say something like

org.jpype.javadoc.TextWidth - set the column width for wrapping paragraphs. (Default "120")
org.jpype.javadoc.EnableDomains - use :class: and :meth: when linking. (Default "True")
org.jpype.javadoc.EnableExternal - add links to external document (Default "True")

Do you have any additional properties you would like to see controllable such as sections to include or exclude? If you have preferences I will see if I can squeeze it in prior to the release candidate.

petrushy · 2020-06-11T15:13:33Z

Hi, yes sounds good - I don't think it is necessary to be widely exposed, this is likely more for people who are tuning python wrappers of java libraries. I don't have any additional, one needs to find the quirky cases I guess to see what more may be needed to tune, but this can be done in future versions.

Thanks!

Thrameos · 2020-06-11T15:49:06Z

I looked into it further. The module doing the rendering for help is pydoc. Its support for sphinx domains and such is really underwhelming (read non-existent). I am a bit shocked that the integration between these isn't tighter.

Given that, it seems like I should just have a master style switch for sphinx or pydoc rendering as org.jpype.javadoc.Style so that the user doesn't have a bunch of settings to play with.

petrushy · 2020-06-11T16:50:27Z

Yep, saw some request of that for Jupyter but seems not to be near. A master switch would work well.

Thrameos added the enhancement Improvement in capability planned for future release label Apr 30, 2020

michi42 mentioned this issue May 8, 2020

automatic type stub (pyi) generation for java classes #714

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature idea: Extraction of docstrings from javadoc #702

Feature idea: Extraction of docstrings from javadoc #702

petrushy commented Apr 24, 2020

petrushy commented Apr 24, 2020 •

edited

Loading

Thrameos commented Apr 24, 2020 •

edited

Loading

Thrameos commented Apr 25, 2020

petrushy commented Apr 25, 2020

Thrameos commented May 20, 2020 •

edited

Loading

Thrameos commented May 24, 2020

Thrameos commented May 25, 2020

petrushy commented May 28, 2020

petrushy commented May 28, 2020

Thrameos commented May 28, 2020 via email

petrushy commented May 28, 2020

Thrameos commented May 28, 2020

Thrameos commented May 29, 2020

petrushy commented May 29, 2020

petrushy commented May 29, 2020

Thrameos commented May 29, 2020

petrushy commented May 29, 2020

Thrameos commented May 29, 2020

petrushy commented May 29, 2020

petrushy commented May 29, 2020 •

edited

Loading

Thrameos commented Jun 11, 2020

petrushy commented Jun 11, 2020

Thrameos commented Jun 11, 2020

petrushy commented Jun 11, 2020

Thrameos commented Jun 11, 2020

petrushy commented Jun 11, 2020

Feature idea: Extraction of docstrings from javadoc #702

Feature idea: Extraction of docstrings from javadoc #702

Comments

petrushy commented Apr 24, 2020

petrushy commented Apr 24, 2020 • edited Loading

Thrameos commented Apr 24, 2020 • edited Loading

Thrameos commented Apr 25, 2020

petrushy commented Apr 25, 2020

Thrameos commented May 20, 2020 • edited Loading

Thrameos commented May 24, 2020

Thrameos commented May 25, 2020

petrushy commented May 28, 2020

petrushy commented May 28, 2020

Thrameos commented May 28, 2020 via email

petrushy commented May 28, 2020

Thrameos commented May 28, 2020

Thrameos commented May 29, 2020

petrushy commented May 29, 2020

Description

petrushy commented May 29, 2020

Thrameos commented May 29, 2020

petrushy commented May 29, 2020

Thrameos commented May 29, 2020

petrushy commented May 29, 2020

petrushy commented May 29, 2020 • edited Loading

Thrameos commented Jun 11, 2020

petrushy commented Jun 11, 2020

Thrameos commented Jun 11, 2020

petrushy commented Jun 11, 2020

Thrameos commented Jun 11, 2020

petrushy commented Jun 11, 2020

petrushy commented Apr 24, 2020 •

edited

Loading

Thrameos commented Apr 24, 2020 •

edited

Loading

Thrameos commented May 20, 2020 •

edited

Loading

petrushy commented May 29, 2020 •

edited

Loading