Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter specific graph in nquads #12

Open
ktk opened this issue Feb 15, 2018 · 6 comments
Open

Filter specific graph in nquads #12

ktk opened this issue Feb 15, 2018 · 6 comments
Labels

Comments

@ktk
Copy link

ktk commented Feb 15, 2018

First, thanks for serdi, very nice & fast library!

I often use it in RDF creation pipelines and one job I do on a regular base is to convert quads to triples. Sometimes the graph does not matter but in other cases it would be nice to be able to specify which graph I want to have in the output and throw away the rest (or vice versa).

I currently do this with pipe-filters but I would feel more comfortable when I could add that as parameter to serdi.

@drobilla
Copy link
Owner

drobilla commented Feb 15, 2018

Thanks / you're welcome :)

Good idea. I've had the same thought, actually, though I was thinking of taking it a bit further and allow general patterns (or at least subject and predicate as well). Not sure about blanks... maybe it could just not support them, or do a simple string match which would still be handy.

The idea of bloating the still very "do one thing and do it well" serdi concerns me a bit, but a separate rdf_grep sort of thing would mostly be the same program with some filtering stuff added anyway, so I guess that doesn't make sense.

Might need to break the API to do this well, though I'm not sure, and I think it's time to break it and clean some things up for the next major version anyway.

@ktk
Copy link
Author

ktk commented Feb 15, 2018

The rdf_grep (or tgrep for triple grep?) idea is tempting, I like the idea of not bloating serdi. Streaming is essential for the datasets I use it for.

For grepping I use '<...> <' patterns but that does not work for everything and it's not very nice to write.

@drobilla
Copy link
Owner

Added this in the serd1 branch with 116c73a if you want to give it a shot.

Could still use a bit of polish (and command line flags are starting to run out), but seems to do the job.

@drobilla
Copy link
Owner

drobilla commented Dec 20, 2019

It borrows a shred of SPARQL so you can write an NQuads statement with ?variable syntax and use it with the -g option (for "grep"), e.g.

serdi -g '?s ?p ?o <http://example/g> .' tests/NQuadsTests/nq-syntax-uri-01.nq

@ktk
Copy link
Author

ktk commented Dec 22, 2019

Excellent, I will give it a try thanks! The SPARQL approach makes a lot of sense to me.

@drobilla
Copy link
Owner

drobilla commented Apr 8, 2021

It seemed weird to only have "inclusive" or "exclusive" (like grep -v) filtering, so I changed this to two separate flags: -F and -G (roughly for "filter" and "grep", respectively) in the latest version (on branch serd1-meson) that should hopefully see release soon.

As it happens the (also new) validation checks had the same problem, and I'm increasingly worried about the rapidly disappearing flag characters, but it seemed questionable to invent some kind of odd command line syntax (like a universal negation flag that negates the thing after it), so I guess this will have to do until some future even fancier future version forces the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants