Skip to content

Commit

Permalink
feat (core): Graph Commons JSON Converter
Browse files Browse the repository at this point in the history
  • Loading branch information
vorburger committed Dec 31, 2024
1 parent ac1e64b commit be6c75c
Show file tree
Hide file tree
Showing 37 changed files with 890 additions and 36 deletions.
2 changes: 2 additions & 0 deletions docs/use/docgen/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@ We can generate nice Markdown documentation as seen in [the tutorial](../../mode
or for [our example Library model](../library/index.md), including a Graph in either Mermaid.JS or
[Graphwiz](../rosetta/index.md#graphviz) or [GEXF](../rosetta/index.md#gexf) format, and
[a Timeline](../../models/example.org/timeline.md).

`docgen` (old) will later be integrated into (new) [generic `gen markdown`](../gen/index.md#markdown).
21 changes: 20 additions & 1 deletion docs/use/fetch/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,28 @@ $ ./enola fetch "data:application/json;charset=UTF-8,%7B%22key%22%3A+%22value%22
...
```

### File Descriptor

`fd:` is a (non-standard) URL scheme in Enola for reading from or writing to [file descriptors](https://en.wikipedia.org/wiki/File_descriptor), for:

* `fd:0` [STDIN](https://en.wikipedia.org/wiki/Stdin)
* `fd:1` [STDOUT](https://en.wikipedia.org/wiki/Stdout)
* `fd:2` [STDERR](https://en.wikipedia.org/wiki/Stderr)

The _Media Type_ of this special resource will be `application/octet-stream` (**not** `application/binary`),
unless there is [a `?mediaType=` parameter](#media-type).

The _Charset_ will be the default of the JVM,
unless there is (checked first) [a `?charset=` parameter](#charset) (e.g. `fd:0?charset=UTF-16BE`),
or the `?mediaType=` parameter includes a charset (e.g. `fd:1?mediaType=application/yaml;charset=utf-16be`).

<!-- If updating ^^^ then also update JavaDoc of dev.enola.common.io.resource.FileDescriptorResource -->

<!-- TODO Support '-' as special URI shortcut for fd:0 STDIN? -->

### Empty

`empty:` is a (non-standard) URL scheme in Enola for "no content" (as an alternative to `data:,`):
`empty:` is another (non-standard) URL scheme in Enola for "no content" (as an alternative to `data:,`):

```bash cd ../.././..
$ ./enola fetch empty:/
Expand Down
59 changes: 59 additions & 0 deletions docs/use/gen/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
<!--
SPDX-License-Identifier: Apache-2.0
Copyright 2023-2024 The Enola <https://enola.dev> Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Generate

Generates [various outputs](../help/index.md#gen) from (loaded) _Things._

Many of these formats "mirror" the [respective support in Rosetta](../rosetta/index.md).
The difference is that Rosetta transforms one resource into another one,
whereas this generates (possibly several) output/s from possibly several input/s.

## Sigma (TODO)

We have plans to support https://www.sigmajs.org.

## Graphviz

Based on [Rosetta's Graphviz support](../rosetta/index.md#graphviz):

```bash cd ../.././..
$ ./enola gen graphviz --no-file-loader --load=enola:TikaMediaTypes --output /tmp/
...
```

Produces a (rather huge...) `graphviz.gv` which can then be rendered to an (ugly!) SVG e.g. using:

dot -Ksfdp -Tsvg -O /tmp/graphviz.gv

## GEXF

See [Rosetta's GEXF support](../rosetta/index.md#gexf).

<!--
## Graph Commons (TODO)
See [Rosetta's Graph Commons support](../rosetta/index.md#graph-commons).
./enola rosetta --in enola:TikaMediaTypes --out /tmp/TikaMediaTypes.graphcommons.json
-->

## Markdown (TODO)

[The (old) `docgen`](../docgen/index.md) will later be integrated into a (new) generic `gen markdown`.
9 changes: 9 additions & 0 deletions docs/use/help/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,15 @@ $ ./enola help docgen
...
```

## Gen

[Generation](../gen/index.md) has the following options:

```bash cd ../.././..
$ ./enola help gen
...
```

## Get

[Get Entity](../get/index.md) has the following options:
Expand Down
15 changes: 15 additions & 0 deletions docs/use/rosetta/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,11 @@ e.g. between:

Specifying the `--schema` flag is optional for YAML <=> JSON conversion, but required for TextProto.

Rosetta transforms a (single) input resource into one output resource of another format.
Alternatively,
[generate](../gen/index.md) can load (possibly several) resources
(or "logical IRIs") which contain _Things_ and transform them into (some of) these formats.

## Graph Diagrams

Enola can generate [Graph Diagrams like this](../../models/example.org/graph.md), through [DocGen](../docgen/index.md) (see
Expand Down Expand Up @@ -63,6 +68,16 @@ $ ./enola rosetta --no-file-loader --in test/picasso.ttl --out "docs/BUILT/picas

![Smaller Graph of Painters](../../BUILT/picasso-small.gv.svg)

<!--
### Graph Commons
```bash cd ../.././..
$ ./enola rosetta --in test/picasso.ttl --out /tmp/picasso.graphcommons.json
...
```
produces a JSON which can be imported into [GraphCommons.com](https://graphcommons.com/).
-->
### GEXF

```bash cd ../.././..
Expand Down
1 change: 1 addition & 0 deletions java/dev/enola/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ java_export(
# TODO Fix JavaDoc generation which somehow breaks the build; see https://github.com/enola-dev/enola/issues/491
tags = ["no-javadocs"],
visibility = ["//:__subpackages__"],
# TODO runtime_deps or exports = [ ?!
runtime_deps = [
"//java/dev/enola/common",
"//java/dev/enola/common/canonicalize",
Expand Down
8 changes: 8 additions & 0 deletions java/dev/enola/cli/CommandWithIRI.java
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import dev.enola.core.view.EnolaMessages;
import dev.enola.rdf.io.RdfWriterConverter;
import dev.enola.rdf.proto.ProtoThingRdfConverter;
import dev.enola.thing.gen.graphcommons.GraphCommonsJsonGenerator;
import dev.enola.thing.gen.graphviz.GraphvizGenerator;
import dev.enola.thing.message.ProtoThings;
import dev.enola.thing.metadata.ThingMetadataProvider;
Expand Down Expand Up @@ -108,6 +109,13 @@ protected void write(Message thing) throws IOException {
return;
}

if (Format.GraphCommons.equals(format) && thing instanceof Things protoThings) {
var javaThings = ProtoThings.proto2java(protoThings.getThingsList());
new GraphCommonsJsonGenerator(thingMetadataProvider)
.convertIntoOrThrow(javaThings, resource);
return;
}

// Otherwise
new ProtoIO(typeRegistryWrapper.get()).write(thing, resource);
}
Expand Down
2 changes: 2 additions & 0 deletions java/dev/enola/cli/Configuration.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import dev.enola.rdf.io.RdfMediaTypeYamlLd;
import dev.enola.rdf.io.RdfMediaTypes;
import dev.enola.thing.gen.gexf.GexfMediaType;
import dev.enola.thing.gen.graphcommons.GraphCommonsMediaType;
import dev.enola.thing.gen.graphviz.GraphvizMediaType;
import dev.enola.thing.io.ThingMediaTypes;

Expand All @@ -46,6 +47,7 @@ class Configuration {
new MarkdownMediaTypes(),
new GraphvizMediaType(),
new GexfMediaType(),
new GraphCommonsMediaType(),
new DatalogMediaTypes(),
new StandardMediaTypes(),
new YamlMediaType(),
Expand Down
4 changes: 4 additions & 0 deletions java/dev/enola/cli/Format.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import dev.enola.common.protobuf.ProtobufMediaTypes;
import dev.enola.rdf.io.RdfMediaTypes;
import dev.enola.thing.gen.graphcommons.GraphCommonsMediaType;
import dev.enola.thing.gen.graphviz.GraphvizMediaType;

public enum Format {
Expand All @@ -30,6 +31,8 @@ public enum Format {

Graphviz,

GraphCommons,

TextProto,

ProtoYAML,
Expand All @@ -43,6 +46,7 @@ MediaType toMediaType() {
case Turtle -> RdfMediaTypes.TURTLE;
case JSONLD -> RdfMediaTypes.JSON_LD;
case Graphviz -> GraphvizMediaType.GV;
case GraphCommons -> GraphCommonsMediaType.GCJSON;

case TextProto -> ProtobufMediaTypes.PROTOBUF_TEXTPROTO_UTF_8;
case ProtoYAML -> ProtobufMediaTypes.PROTOBUF_YAML_UTF_8;
Expand Down
4 changes: 3 additions & 1 deletion java/dev/enola/common/context/Singleton.java
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,9 @@ public Singleton<T> set(T value) {

@Override
public T get() {
if (value == null) throw new IllegalStateException();
if (value == null)
throw new IllegalStateException(
getClass() + " was never set(); use SingletonRule in tests");
else return value;
}

Expand Down
3 changes: 1 addition & 2 deletions java/dev/enola/common/io/mediatype/MediaTypeProvider.java
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,7 @@ default MediaType detect(String uri, ByteSource byteSource, MediaType original)
// TODO It's kinda wrong that this uses MediaTypeProviders.SINGLETON; it would be clearer if
// it only ever used itself. But that requires moving normalize() from MediaTypeProviders
// to... where? Another ABC?! Urgh.
var normalized = MediaTypeProviders.SINGLETON.get().normalize(original);
if (!normalized.equals(original)) return normalized;
original = MediaTypeProviders.SINGLETON.get().normalize(original);

// NB: This looks inefficient, and you could be tempted to do this "the other way around"
// (instead of checking EACH map entry with uri.endsWith(), the URI extension should be
Expand Down
2 changes: 2 additions & 0 deletions java/dev/enola/common/io/resource/FileDescriptorResource.java
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@
*/
public class FileDescriptorResource extends BaseResource implements Resource {

// NB: If updating ^^^ then also update docs/use/fetch/index.md

public static final String STDOUT = "fd:1?charset=UTF-8";

public static final URI STDOUT_URI = URI.create(STDOUT);
Expand Down
10 changes: 7 additions & 3 deletions java/dev/enola/common/io/resource/MediaTypeDetector.java
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,12 @@ class MediaTypeDetector {
private static final Set<MediaType> TRY_FIXING =
ImmutableSet.of(
// raw.githubusercontent.com returns "text/plain" e.g. for *.yaml
MediaType.parse("text/plain"));
MediaType.parse("text/plain"),
// URLConnection assumes JSON for all *.json but we want "longest
// match" e.g. for ".graphcommons.json"
MediaType.parse("application/json"),
// URLConnection assumes XML for .gexf instead of GexfMediaType (with +xml)
MediaType.parse("application/xml"));

private static boolean isSpecial(MediaType mediaType) {
var mediaTypeWithoutParameters = mediaType.withoutParameters();
Expand Down Expand Up @@ -167,8 +172,7 @@ private MediaType detect(@Nullable String contentType, @Nullable String contentE
MediaType mediaType = null;
if (contentType != null) {
mediaType = MediaTypes.parse(contentType);
if (TRY_FIXING.contains(mediaType.withoutParameters())
|| IGNORE.contains(mediaType.withoutParameters())) {
if (isSpecial(mediaType)) {
mediaType = null;
}
}
Expand Down
21 changes: 15 additions & 6 deletions java/dev/enola/common/io/resource/MemoryResourceTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@

import dev.enola.common.context.testlib.SingletonRule;
import dev.enola.common.io.mediatype.MediaTypeProviders;
import dev.enola.common.io.mediatype.YamlMediaType;

import org.junit.Rule;
import org.junit.Test;
Expand All @@ -35,14 +36,15 @@

public class MemoryResourceTest {

public @Rule SingletonRule r = $(MediaTypeProviders.set());
// The new YamlMediaType() is required for non-regression of the funkyYamlURL below
public @Rule SingletonRule r = $(MediaTypeProviders.set(new YamlMediaType()));

private static final byte[] BYTES = new byte[] {1, 2, 3};
private static final String TEXT = "hello, world";

@Test
public void testBinaryMemoryResource() throws IOException {
MemoryResource resource = new MemoryResource(OCTET_STREAM);
var resource = new MemoryResource(OCTET_STREAM);
resource.byteSink().write(BYTES);
assertThat(resource.byteSource().read()).isEqualTo(BYTES);

Expand All @@ -52,16 +54,23 @@ public void testBinaryMemoryResource() throws IOException {

@Test
public void testTextMemoryResource() throws IOException {
MemoryResource resource = new MemoryResource(PLAIN_TEXT_UTF_8);
var resource = new MemoryResource(PLAIN_TEXT_UTF_8);
resource.charSink().write(TEXT);
assertThat(resource.charSource().read()).isEqualTo(TEXT);
}

@Test
public void testMediaTypePrecedence() throws IOException {
// This does not work for PLAIN_TEXT_UTF_8, because that's "special"
public void testMediaTypePrecedenceHTML_GZIP() {
// TODO Fix to also make this work for PLAIN_TEXT_UTF_8, which is "special"
// (It's one of a few MediaTypes which MediaTypeDetector always overrides)
MemoryResource resource = new MemoryResource(URI.create("test.html"), GZIP);
var resource = new MemoryResource(URI.create("test.html"), GZIP);
assertThat(resource.mediaType()).isEqualTo(GZIP);
}

@Test
public void testMediaTypePrecedenceYAML_JSON() {
var funkyYamlURL = "classpath:/picasso.yaml?context=classpath:/picasso-context.jsonld";
var resource = new MemoryResource(URI.create(funkyYamlURL), JSON_UTF_8);
assertThat(resource.mediaType()).isEqualTo(JSON_UTF_8);
}
}
2 changes: 2 additions & 0 deletions java/dev/enola/common/io/resource/ResourceProvider.java
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@
*/
public interface ResourceProvider extends ProviderFromIRI<Resource> {

// TODO Rename all parameters from iri or uri to url - because that's what these are!

// TODO Change all @Nullable Resource to Optional<Resource>... or, better, throw exception for
// unknown schema

Expand Down
2 changes: 1 addition & 1 deletion java/dev/enola/common/io/testlib/ResourceSubject.java
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ private String canonicalize(@Nullable ReadableResource resource) throws IOExcept
public void hasCharsEqualTo(@Nullable ReadableResource expected) throws IOException {
var actualChars = canonicalize(actual);
var expectedChars = canonicalize(expected);
check("charSource").that(actualChars).isEqualTo(expectedChars);
check(actualChars).that(actualChars).isEqualTo(expectedChars);
}

// TODO Improve confusing output for multiline diff
Expand Down
4 changes: 4 additions & 0 deletions java/dev/enola/core/rosetta/Rosetta.java
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@
import dev.enola.rdf.io.RdfResourceConverter;
import dev.enola.thing.gen.gexf.GexfGenerator;
import dev.enola.thing.gen.gexf.GexfResourceConverter;
import dev.enola.thing.gen.graphcommons.GraphCommonsJsonGenerator;
import dev.enola.thing.gen.graphcommons.GraphCommonsResourceConverter;
import dev.enola.thing.gen.graphviz.GraphvizGenerator;
import dev.enola.thing.gen.graphviz.GraphvizResourceConverter;
import dev.enola.thing.io.Loader;
Expand Down Expand Up @@ -103,6 +105,8 @@ public Rosetta(ResourceProvider rp, Loader loader) {
new YamlJsonResourceConverter(),
new GraphvizResourceConverter(loader, new GraphvizGenerator(tmp)),
new GexfResourceConverter(loader, new GexfGenerator(tmp)),
new GraphCommonsResourceConverter(
loader, new GraphCommonsJsonGenerator(tmp)),
new XmlResourceConverter(rp),
new CharResourceConverter()));
// NOT new IdempotentCopyingResourceNonConverter()
Expand Down
Loading

0 comments on commit be6c75c

Please sign in to comment.