Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature idea: Extraction of docstrings from javadoc #702

Open
petrushy opened this issue Apr 24, 2020 · 26 comments
Open

Feature idea: Extraction of docstrings from javadoc #702

petrushy opened this issue Apr 24, 2020 · 26 comments
Labels
enhancement Improvement in capability planned for future release

Comments

@petrushy
Copy link

Hi,

This is likely a far in the future enhancement, but just to write it down.

It would be interesting to have possibility of docstring generation from javadoc. So that for automatic popup info the documentation string is available, with more details as of now.

One needs then of course to have access to the source code. And maybe it could be parsed to some database.

One tool that may be useful is qdox, a java tool that can parse source for javadoc.
https://github.com/paul-hammant/qdox

@petrushy
Copy link
Author

petrushy commented Apr 24, 2020

I was trying some things and it is not possible to monkeypatch the doc property of JObject in the same way as the repr, is it?

@Thrameos
Copy link
Contributor

Thrameos commented Apr 24, 2020

The answer is no and yes. Doc strings are supposed to be fixed immutable strings so you can't patch them directly. But if you look over _jclass you will find the redirect that converts them into properties and redirects them into the method _jclassDoc. You can apply the same procedure to redirect the doc routine to whatever function you need.

Also notable is that if you compiled with -g:source you can get the source location in both _jclassDoc and _jmethodGetDoc which can extract the java doc in the source or let you extract the java doc from the html doc package. I recommend installing your own handler rather than changing the ones in the code as private names may chance.

@Thrameos
Copy link
Contributor

I think two possible solutions here.

First we can look for the javadoc jar resources. It will return the same rather old and crusty html page that javadoc page. If you look know the name mangling you can jump down into the method or class section. Then you would just have to html to rst the blob of html. Obviously we can't get every little detail of html right, but it would get a lot of documentation included.

Second if we can't get the preformed we would call for the source class. Parsing Java is much harder especially if the line number for the method were not compiled in.

In both cases the user just has to add the source or javadoc jar to the classpath. We then use Class.getResource() to fetch the section needed. If it isn't found we just fall back to the usual autodoc.

I took a shot at the parsing, but concluded that it would be at least 2 nights of work to get the javadoc out which is unfortunately a lower priority that work for the 0.8 release.

@petrushy
Copy link
Author

Thanks for the update and the intense work with jpype!

I did some tests with qdox and attaching it at the variables above. Qdox is parsing directly the source tree to find javadoc (and other parts). However, not sure that is the right way, some javadoc are using references and tags (like inheritDoc), which then is not processed, so the look is not optimal, but nice to be able to plug in things like this in the library. I think your first option is likley the best, using the html extract.

@Thrameos Thrameos added the enhancement Improvement in capability planned for future release label Apr 30, 2020
@Thrameos
Copy link
Contributor

Thrameos commented May 20, 2020

@petrushy Progress update. I succeeded in integrating an HTML parser that can extract each of the html sections from the javadoc files and a Zip file system that allows the user to open the base Java API documentation. There are two parts remaining to this task.

  • Convert the html to rst. (I tried a few of packages that are supposed to perform this task but the javadocs are have a style sheet that make it hard to convert with anything generic, so we are going to need to make a custom one.) This one is not so hard. Just simple pattern matches should be able to do a lot of the task. There will be edge case like subscripts and other weird html, but we should be able to get a 90% solution pretty quickly.
  • Integrate the resulting doc into the class and method files. I am not sure how to present fields and inner classes.

Estimated remaining time on this task less than a week. I should be able to get it into the JPype 0.8 release assuming no major hangups.

@Thrameos
Copy link
Contributor

@petrushy Progress update. I have now successfully rendered the entire jdk 8 java doc into rst. It isn't perfect but it is a start. I have one remaining task to link it up to methods and classes. Once that is complete it should be ready to test. Speed is not so good as my parser is pretty crud.

You may want to contribute by improving the renderer as it could use some additional work. Sometimes the combination of html elements generates invalid rst (like "``````"). References and linkage to external documents don't always work. Tables are not rendered at all.

There are three major support classes.

  • JavadocExtractor - pulls all the sections out of html document
  • JavadocTransformer - converts the dom sections into a markup usable by renderer with custom tags. This may be possible to replace with a good xslt, but I am not too good with that tool.
  • JavadocRenderer - Converts the marked up sections into restructured text.

@Thrameos
Copy link
Contributor

@petrushy The requested enhancement is complete. Please test, add a review, and comment so it can be included in JPype 0.8.

@petrushy
Copy link
Author

Hi @Thrameos! Many thanks, will start testing.

@petrushy
Copy link
Author

WIP: Hi did some intial tests, will spend more time later. Some things seems to be extracted, but others don't (has a javadoc) property still there. I assume it shold be UTF8 encoding of the javadoc, there are quite some settings in the project I'm wrapping..

and in pom.xml
maven-javadoc-plugin
${orekit.maven-javadoc-plugin.version}

${basedir}/src/main/java/org/orekit/overview.html

--allow-script-in-comments
-header
'${orekit.mathjax.config} ${orekit.mathjax.enable}'
-extdirs
${tools.jar.dir}

CS Group. All rights reserved.]]>

https://docs.oracle.com/javase/8/docs/api/
https://www.hipparchus.org/apidocs/

${orekit.compiler.source}
none

Will investigate and try to generate a cleaner javadoc. But seems like some classes that are not detected are rather plain. WIP.

@Thrameos
Copy link
Contributor

Thrameos commented May 28, 2020 via email

@petrushy
Copy link
Author

Yes, thanks, it's the orekit library I'm working with, artifacts at:
https://repo1.maven.org/maven2/org/orekit/orekit/10.1/

For example org.orekit.time.AbsoluteDate is one that does not seem to work.
https://www.orekit.org/static/apidocs/org/orekit/time/AbsoluteDate.html

While
org.orekit.time.TimeScalesFactory works
https://www.orekit.org/static/apidocs/org/orekit/time/TimeScalesFactory.html

@Thrameos
Copy link
Contributor

Okay I will investigate this evening. (I may need to add a diagnostics mode that one can trigger to get a translation and rendering report.)

@Thrameos
Copy link
Contributor

I looks rendered just fine for me. Can you be more specific about what issue you are seeing?

Here is what I see and the script that generated it.

doc.txt
testDoc3.txt

@petrushy
Copy link
Author

Wierd. I simplified your script a bit, tried it in python 3.6 & 3.7 (conda versions), but get:

Description

Failed to extract javadoc for class org.orekit.time.AbsoluteDate
Java class 'org.orekit.time.AbsoluteDate'

Extends:
    java.lang.Object

Interfaces:
    org.orekit.time.TimeStamped, org.orekit.time.TimeShiftable,
    java.lang.Comparable, java.io.Serializable

...

I have all orekit and hipparchus jar's (not the javadoc for hipparcus) and orekit javadoc in same dir as script:

import jpype
from jpype.types import *
import jpype.imports
jpype.startJVM(classpath=['./*'])
import org

p = org.orekit.time.AbsoluteDate

print("Description")
print("-----------")
print(p.doc)

Tested with a new environment also in conda.

I am using openjdk 8 from conda, cannot test with a newer at the moment.

@petrushy
Copy link
Author

source is from the Thrameos/javadoc branch

@Thrameos
Copy link
Contributor

Okay I can confirm this one. It appears to work on Linux with all versions of Python and JDK 8-11 but fail on Python-3.5 with JDK 11. I will investigate.

@petrushy
Copy link
Author

I'm on windows currently, have tried with same results on Python 3.6 & 3.7. Can test later on mac / linux.

@Thrameos
Copy link
Contributor

Okay I corrected a few issues that I located in that example. You can use

jde = JClass("org.jpype.javadoc.JavadocExtractor")
jde.failures = True

to get the source of the problem. Some of the hyperlinks appear busted (in different ways on linux and windows) but these are mostly just rendering issues that we can track down later. Overall I think this can be included with some followup to address rendering issues.

You may want to do a full doc extraction run to see what other problems need to be addressed. For now I have to move on to 0.8 bug hunt so I can finally finish the release.

@petrushy
Copy link
Author

Ok, will experiment with it.

Yes, it is still very usable and looking forward for 0.8 release! Thank you for your efforts in this development!

@petrushy
Copy link
Author

petrushy commented May 29, 2020

WIP: Removed comment of not working under linux as it somehow is working now. Could be user error.

Tried with different versions of openjdk under windows (8, 11) and the example above do not work in any of them.

@Thrameos
Copy link
Contributor

So any conclusion on how well it is working?

@petrushy
Copy link
Author

Now it is working in windows as well, for some practical tests really well, now using JDK 8. Many thanks for implementing this, especially useful for end-users of "wrapped" java libraries.

Some minor personal preference are user-settable linelength and possibility to filter away the meta tags, like the : class/meth : .' ' (I would prefer it just removed) That looks really nice in a tool that supports rst rendering of javadocs like spyder, but looks a bit noisy in some other environments like jupyterlab, which is a common one. I may have a try at this later, could be user settable.

Many many thanks for implementing this, and the overall improvement of jpype, lots of work.

BR

@Thrameos
Copy link
Contributor

Hmm. Okay I suppose that we can find a way to check what the environment is and select the appropriate render properties. The rendering properties are not that hard to control though I am hesitant to make them public symbols as they are.

Perhaps we should just make them pull the values from System.getProperty. Then you would be able to just call the property with the desired value and leave the implementation free to change if needed in the future rather than having people poke at private symbols.

Say something like

  • org.jpype.javadoc.TextWidth - set the column width for wrapping paragraphs. (Default "120")
  • org.jpype.javadoc.EnableDomains - use :class: and :meth: when linking. (Default "True")
  • org.jpype.javadoc.EnableExternal - add links to external document (Default "True")

Do you have any additional properties you would like to see controllable such as sections to include or exclude? If you have preferences I will see if I can squeeze it in prior to the release candidate.

@petrushy
Copy link
Author

Hi, yes sounds good - I don't think it is necessary to be widely exposed, this is likely more for people who are tuning python wrappers of java libraries. I don't have any additional, one needs to find the quirky cases I guess to see what more may be needed to tune, but this can be done in future versions.

Thanks!

@Thrameos
Copy link
Contributor

I looked into it further. The module doing the rendering for help is pydoc. Its support for sphinx domains and such is really underwhelming (read non-existent). I am a bit shocked that the integration between these isn't tighter.

Given that, it seems like I should just have a master style switch for sphinx or pydoc rendering as org.jpype.javadoc.Style so that the user doesn't have a bunch of settings to play with.

@petrushy
Copy link
Author

Yep, saw some request of that for Jupyter but seems not to be near. A master switch would work well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement in capability planned for future release
Projects
None yet
Development

No branches or pull requests

2 participants