Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kf/visualizer #72

Merged
merged 21 commits into from
May 28, 2024
Merged

Kf/visualizer #72

merged 21 commits into from
May 28, 2024

Conversation

KasperFyhn
Copy link
Contributor

There are three main contributions from this pull request:

  1. A more properly implemented corpus processing step in the generic pipeline, which outputs triplets, triplet stats, entity and predicate mappings, graph data, and a static graph.
  2. Revamp of clustering approach for entities and predicates. It is not that different from Stine's work. The main difference is in choosing a main label. The overall strategy is:
    1. Create embedding for each entity/predicate.
    2. Cluster those embeddings with HDBSCAN.
    3. Combine clusters that have labels in common.
    4. Select as the main label for the cluster the most prototypical member of the cluster. The most prototypical member is the member with the embedding closest to the centroid embeddings of the cluster (i.e. the mean of all members of the cluster).
  3. A visualization tool. I had hoped for something much more capable where you could inspect the entities' locations in text pieces etc., but for now this will have to do.

I am focusing on something else, so I am opening this to avoid it just going stale and being forgotten. Some parts of the code probably show this.

@KasperFyhn KasperFyhn marked this pull request as ready for review May 23, 2024 13:47
@KasperFyhn
Copy link
Contributor Author

The failing documentation seems to be something in/with Sphinx and not the changes here. Any suggestions on how to fix it @KennethEnevoldsen ?

@KennethEnevoldsen
Copy link
Contributor

Can you run it locally?

you might try:
set sphinx==5.3.0 in the pyproject.toml (though I don't think it should work)

The error seems to stem from:

File "/home/runner/.local/lib/python3.10/site-packages/IPython/core/guarded_eval.py", line 40, in <module>
      from typing_extensions import TypeAliasType

which does not seem to be a sphinx issue

@KasperFyhn
Copy link
Contributor Author

Nope, does not work locally, not even on main.

@KennethEnevoldsen
Copy link
Contributor

Nope, does not work locally, not even on main.

Will take a look!

@KasperFyhn
Copy link
Contributor Author

Maybe with IPython. From the error log on my machine:

# Sphinx version: 5.3.0
# Python version: 3.10.12 (CPython)
# Docutils version: 0.19 
# Jinja2 version: 3.1.3
# Last messages:

# Loaded extensions:
Traceback (most recent call last):
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/sphinx/cmd/build.py", line 276, in build_main
    app = Sphinx(args.sourcedir, args.confdir, args.outputdir,
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/sphinx/application.py", line 223, in __init__
    self.setup_extension(extension)
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/sphinx/application.py", line 398, in setup_extension
    self.registry.load_extension(self, extname)
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/sphinx/registry.py", line 472, in load_extension
    metadata = setup(app)
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/myst_nb/__init__.py", line 8, in setup
    from .sphinx_ext import sphinx_setup
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/myst_nb/sphinx_ext.py", line 22, in <module>
    from myst_nb.ext.download import NbDownloadRole
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/myst_nb/ext/download.py", line 8, in <module>
    from myst_nb.sphinx_ import SphinxEnvType
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/myst_nb/sphinx_.py", line 29, in <module>
    from myst_nb.core.execute import ExecutionResult, create_client
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/myst_nb/core/execute/__init__.py", line 6, in <module>
    from .base import ExecutionError, ExecutionResult, NotebookClientBase  # noqa: F401
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/myst_nb/core/execute/base.py", line 13, in <module>
    from myst_nb.ext.glue import extract_glue_data
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/myst_nb/ext/glue/__init__.py", line 8, in <module>
    import IPython
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/IPython/__init__.py", line 55, in <module>
    from .terminal.embed import embed
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/IPython/terminal/embed.py", line 16, in <module>
    from IPython.terminal.interactiveshell import TerminalInteractiveShell
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/IPython/terminal/interactiveshell.py", line 48, in <module>
    from .debugger import TerminalPdb, Pdb
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/IPython/terminal/debugger.py", line 6, in <module>
    from IPython.core.completer import IPCompleter
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/IPython/core/completer.py", line 219, in <module>
    from IPython.core.guarded_eval import guarded_eval, EvaluationContext
  File "~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/IPython/core/guarded_eval.py", line 40, in <module>
    from typing_extensions import TypeAliasType
ImportError: cannot import name 'TypeAliasType' from 'typing_extensions' (~/PycharmProjects/conspiracies/venv/lib/python3.10/site-packages/typing_extensions.py)

@KennethEnevoldsen
Copy link
Contributor

Hmm I don't get that one (will just recreate my env). I can get it to work if I do:

nb_execution_mode = "off"  # "auto", "force", "off"

In the docs/conf.py

If that works then it seem to be an issue with the notebook execution

@KasperFyhn
Copy link
Contributor Author

Hmm, interesting. That was also my first thought and that the error message just obscured what was the actual reason. I'll take a closer look tomorrow. It is surprising that it does not work on main either, but maybe it has gone under the radar somehow.

@KasperFyhn
Copy link
Contributor Author

Without having the complete overview, I think what happened was some issue in peer-dependencies. If I opened IPython on my own machine, I would see the same error. I cannot reproduce it now, though; but maybe the latest version IPython needed a newer version of typing_extensions than what some of the other dependencies (which are kept back on older versions) would allow for.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
src/conspiracies/pipeline/pipeline.py Show resolved Hide resolved
src/conspiracies/pipeline/pipeline.py Outdated Show resolved Hide resolved
src/conspiracies/corpusprocessing/clustering.py Outdated Show resolved Hide resolved
visualizer/README.md Outdated Show resolved Hide resolved
@KasperFyhn KasperFyhn merged commit 6cc2b53 into main May 28, 2024
5 checks passed
@KasperFyhn KasperFyhn deleted the kf/visualizer branch May 28, 2024 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants