Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kf/generic pipeline #59

Merged
merged 22 commits into from
Feb 14, 2024
Merged

Kf/generic pipeline #59

merged 22 commits into from
Feb 14, 2024

Conversation

KasperFyhn
Copy link
Contributor

@KasperFyhn KasperFyhn commented Jan 12, 2024

Introducing the "generic pipeline". The idea is as follows:

You bring your data, e.g. in text files, tweets in JSON lines or whatever. There are different pre-processors to handle that. They implement the method Preprocessor._do_preprocessing(). If your data is special, you or someone else can implement a new preprocessor to handle that data format. The rest kind of works out of the box since the data is streamlined after preprocessing.

Let' say that you have a bunch of text files in a folder under input/my_input. The whole thing is run with the run.py script as follows.

python3 run.py my_project_name "input/my_input/*.txt" [-c config/my_config.toml]

You'll get this output:

output/my_project_name/
    annotations.ndjson
    graph.json
    graph.png
    nodes_edges.json
    preprocessed.ndjson
    triplets.csv

It can all be refined, but I figured this was a good time for review and maybe merging it in since it basically works. The rest can come down the road.

Next up is finding English components for the pipeline such that one can set "en" as language in configuration (or CLI argument, perhaps).

Copy link

github-actions bot commented Jan 12, 2024

Coverage

Coverage Report
FileStmtsMissCoverMissing
run.py13130%1–39
TOTAL13130% 

@KasperFyhn
Copy link
Contributor Author

@KennethEnevoldsen

There is a lot of new code, though much of it is essentially copy-pasted from individual scripts in paper/. Do let me know if you need a quick walk-through. 🙌

@KasperFyhn KasperFyhn marked this pull request as ready for review February 6, 2024 07:22
@KasperFyhn KasperFyhn merged commit 13c514e into main Feb 14, 2024
5 checks passed
@KasperFyhn KasperFyhn deleted the kf/generic-pipeline branch February 14, 2024 12:00
@KennethEnevoldsen
Copy link
Contributor

HI @KasperFyhn , sorry I didn't notice this - feel free to let me know on Slack if I haven't been responsive (I should get a notification)

@KasperFyhn
Copy link
Contributor Author

Will do. I figured you were preoccupied and decided to go ahead and merge.

I wonder if you didn't get notified because it started as a draft PR which I then opened for review later on.

@KennethEnevoldsen
Copy link
Contributor

Might be the case - might also have come in when I was on the conference in Norway in which case I might have missed it in the pipe of git related stuff I returned to

@KasperFyhn KasperFyhn mentioned this pull request Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants