Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable basic interlinking of function names -> reference page #1

Closed
machow opened this issue Nov 7, 2022 · 7 comments
Closed

Enable basic interlinking of function names -> reference page #1

machow opened this issue Nov 7, 2022 · 7 comments

Comments

@machow
Copy link
Owner

machow commented Nov 7, 2022

Interlinking lets us take a reference to a function in a docstring (or qmd), and render it as a link to the function's documentation. This could be a link either within the current documentation (e.g. quartodoc docs) or across documentation sites (e.g. to python.org/docs).

Key pieces:

  • generate a sphinx inventory file (or something similar)
  • figure out a quarto strategy for interlinking (e.g. a filter?)

Examples:

Note the "See Also" section, and links from annotations in these pages..

image

Sphinx inventory files

A sphinx inventory file maps function names (or class member, etc..) to their internal pages on a documentation site. It is a sort-of gzipped file, and its structure is essentially:

  • "<object_type>": { <name>: <internal_url> }

See this the sphinx inventory case study for how to read and use these files.

Quarto interlinking

From what I've gathered, there are two ways we could do interlinking:

  • when generating a qmd: quartodoc generates a reference page w/ links already generated. This would restrict us to interlinking only references in docstrings.
  • during qmd rendering: quarto maybe could do interlinking through filters?

Extra: how does pkgdown interlink?

pkgdown is an R package that can handle interlinking (note it does not use quarto).

It interlinks by the following strategy:

  • generate a _pkgdown.yml file with with a reference section, that lists out all doc API functions.
  • for crosslinking to other doc sites...
    • use _pkgdown.yml or the DESCRIPTION file to get the URL to the doc site
    • afaict assumes function docs have url of form: {SITE_URL}/reference/{function_name}. (e.g. see this downlit code)
@machow
Copy link
Owner Author

machow commented Nov 8, 2022

It sounds like quarto project level config (_quarto.yml) gets merged with document config, so we should be able to create a quarto filter that can be configured the same as intersphinx.

E.g. using plotnine conf.py as an example, a quarto config might look like...

interlink:
  sources:
    'python':
      url: 'https://docs.python.org/3/'
      inventory: null

    # probably also fine to use this as a shortcut for url field, with inventory null
    'matplotlib': 'https://matplotlib.org/stable'

    'numpy': 'https://numpy.org/doc/stable/'
    'scipy': 'https://docs.scipy.org/doc/scipy'
    'statsmodels': 'https://www.statsmodels.org/stable/'
    'pandas': 'https://pandas.pydata.org/pandas-docs/stable/'
    'sklearn': 'https://scikit-learn.org/stable/'
    'skmisc': 'https://has2k1.github.io/scikit-misc/stable/'
    'adjustText': 'https://adjusttext.readthedocs.io/en/latest/'
    'patsy': 'https://patsy.readthedocs.io/en/stable'

  # numpydoc interlinking options
  aliases:
    'Series': 'pandas.Series'
    'boolean': 'bool'
    'element_line': 'plotnine.themes.element_line'    

    # TODO: what's the equivalent to this?
    'position': ':term:`position`'

@machow
Copy link
Owner Author

machow commented Nov 17, 2022

See these jupyterbook docs:

  • intersphinx linking using [my link](content:references:labels) (docs page)
  • jupyter book config has an intersphinx_mapping field (see this config)

Essentially if you have this in your config...

intersphinx_mapping:
      myst-nb:
        - "https://myst-nb.readthedocs.io/en/latest/"
        - null

Then you can link like this to the myst-nb docs:

[check out myst-nb stdout](myst-nb:render/output/stdout-stderr)

The example above has these parts:

  • content - myst-nb
  • references - nb:render/output/stdout-stderr
  • no labels specified

@machow
Copy link
Owner Author

machow commented Nov 18, 2022

AFAICT the Sphinx inventory file format is not officially documented -- here is an attempt to reverse engineer / describe it:

https://sphobjinv.readthedocs.io/en/stable/syntax.html

@bskinn
Copy link

bskinn commented Dec 14, 2022

👋 sphobjinv author here. I've spent a lot of time poking at objects.invs; happy to try to answer any questions you might have.

If you decide you want to craft an objects.inv, that's one of the specific intended functions of the sphobjinv API. (It's been pointed out to me recently that the docs do a poor job of advertising what the package does; I'm working to improve them: bskinn/sphobjinv#270.)

As well, if you discover that my format spec is wrong somewhere, please let me know so I can fix the spec and the package.

Good luck!

@machow
Copy link
Owner Author

machow commented Jul 6, 2023

Hey @bskinn! I'm so sorry for missing this message 😬. sphobjinv has been an absolute lifesaver, in terms of understanding what's going on in inventory files. If not for anything else, this page's reverse engineering the v2 syntax, was so helpful for knowing what is even valid!

For quartodoc, I'm just using sphobjinv in two ways:

  • to dump .inv files to json
  • to construct inventories to dump to json

One kind of funny thing I noticed, is that the json format sphobjinv uses inserts the items list in the top-level object, with numbers as keys.

For example, from the nbformat.inv:

{
  "project": "nbformat",
  "version": "5.7",
  "count": 51,
  "0": {"name": "jsonschema.exceptions.ValidationError", "domain": "py", "role": "class", "priority": "-1", "uri": "api.html#nbformat.ValidationError", "dispname": "-"}, 
  "1": {"name": "nbformat", "domain": "py", "role": "module", "priority": "0", "uri": "api.html#module-$", "dispname": "-"}, 
  "2": ...
}

Right now, quartodoc tweaks the json a bit so that the items are in a list:

{
  "project": "quartodoc",
  "version": "0.0.9999",
  "count": 106,
  "items": [{"name": "quartodoc.get_function", ...
}

I'd love to get your perspective on sphobjinv's choice of json format, and whether it's being used in any projects. It seems like having a json inventory format would be pretty convenient (and quartodoc will likely need to use it, in case we need to cross reference with docs for other languages, like R :)

@bskinn
Copy link

bskinn commented Jul 6, 2023

No worries, @machow -- very glad to hear it's been valuable for you!

You raise a great question about the JSON format. Off the top of my head, I can't think of a specific reason why that format is necessary... and, indeed, it carries a bunch of useless information.

I dug back into the git blame on that code and I think this format is an artifact of (a) the path I took to a structured data format and (b) my poor knowledge of JSON at the time, six years ago.

Originally, I was building this data structure in a nested way, struct_dict[domain][role][name], or similar. Then, I realized that this was going to be annoying to work with for what I wanted to do with the data—and likely annoying for others to use, too—so I reduced it down to the current int-as-string-keyed dict. I wanted the inventory entries to retain the same sequence as they appeared in the objects.inv, and at the time dicts didn't guarantee retention of insertion order, so indexing on sortable keys was how I implemented it. I think it just didn't occur to me I could've used a list there.

Only afterwards did I realize that what I had put together there could be output as JSON, at which point I exposed that JSON without putting any thought into improving the layout. Even if I had, I might not've known that JSON supported list objects... I might've thought it only permitted pure mappings.

I would absolutely be willing to add an alternative JSON format that matches yours, to go into a v2.x release, whenever next I can work on it. Then, I think I would be inclined to make the list-of-dicts layout the default in a v3, where the breaking API/schema change would be viable.

Would you create a feature request issue for this?

@machow
Copy link
Owner Author

machow commented Jul 6, 2023

Thanks for taking the time to dig into this! It's pretty easy to work with the current format! I've opened a request here:

bskinn/sphobjinv#283

(going to close this issue, since quartodoc can do interlinks :)

@machow machow closed this as completed Jul 6, 2023
@github-project-automation github-project-automation bot moved this to Done in quartodoc Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants