Reference Implementation of the Common Provenance Model

This library, originally developed as part of a master's thesis (available here), is a reference implementation of the Common Provenance Model (CPM) from the ISO 23494 standard. It reflects the model's state as of Spring 2025 (reference) and is implemented as an extension of the ProvToolbox library.

Status

Requirements

Java 23
Maven 3.9.9

Installation

Since this library is not yet published to Maven Central, download the latest .jar files from Releases and add them to your project manually.

cpm-core

Download the cpm-core-1.0.0.jar from Releases.
Place the cpm-core-1.0.0.jar in a directory inside your project (e.g., src/main/resources).
Add the following plugin to your pom.xml:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-install-plugin</artifactId>
    <version>3.1.4</version>
    <executions>
        <execution>
            <id>install-core</id>
            <phase>clean</phase>
            <configuration>
                <file>${project.basedir}/src/main/resources/cpm-core-1.0.0.jar</file>
                <groupId>cz.muni.fi.cpm</groupId>
                <artifactId>cpm-core</artifactId>
                <version>1.0.0</version>
                <packaging>jar</packaging>
                <generatePom>true</generatePom>
            </configuration>
            <goals>
                <goal>install-file</goal>
            </goals>
        </execution>
    </executions>
</plugin>

Run mvn clean
Add the following dependencies to your pom.xml:

<dependency>
    <groupId>org.openprovenance.prov</groupId>
    <artifactId>prov-model</artifactId>
    <version>2.2.1</version>
</dependency>
<dependency>
    <groupId>cz.muni.fi.cpm</groupId>
    <artifactId>cpm-core</artifactId>
    <version>1.0.0</version>
</dependency>

cpm-template

Important: cpm-template uses cpm-core as its dependency!

Download the cpm-template-1.0.0.jar from Releases.
Place the cpm-template-1.0.0.jar in the same directory inside your project.
Add the following execution to the maven-install-plugin plugin to your pom.xml:

<execution>
    <id>install-template</id>
    <phase>clean</phase>
    <configuration>
        <file>${project.basedir}/src/main/resources/cpm-template-1.0.0.jar</file>
        <groupId>cz.muni.fi.cpm</groupId>
        <artifactId>cpm-template</artifactId>
        <version>1.0.0</version>
        <packaging>jar</packaging>
        <generatePom>true</generatePom>
    </configuration>
    <goals>
        <goal>install-file</goal>
    </goals>
</execution>

Run mvn clean
Add the following dependencies to your pom.xml:

<dependency>
    <groupId>org.openprovenance.prov</groupId>
    <artifactId>prov-interop</artifactId>
    <version>2.2.1</version>
</dependency>
<dependency>
    <groupId>org.openprovenance.prov</groupId>
    <artifactId>prov-nf</artifactId>
    <version>2.2.1</version>
</dependency>
<dependency>
    <groupId>cz.muni.fi.cpm</groupId>
    <artifactId>cpm-template</artifactId>
    <version>1.0.0</version>
</dependency>

`cpm-core` Module Overview

The cpm-core module centers around the CpmDocument class, which enables graph-based traversal and querying of provenance documents. Internally, the document is represented as a traversable graph composed of nodes and edges.

Initialization of `CpmDocument`

The CpmDocument class can be initialized in multiple ways. The two most commonly used approaches are:

Initialization from a ProvToolBox `Document`

This method is suitable when working with a ProvToolBox Document containing exactly one bundle:

ProvFactory pF = new ProvFactory();
ICpmFactory cF = new CpmMergedFactory(pF);
ICpmProvFactory cPF = new CpmProvFactory(pF);

Document document = pF.newDocument();
document.setNamespace(cPF.newCpmNamespace());

QualifiedName id = pF.newQualifiedName("uri", "bundle", "ex");
Bundle bundle = pF.newNamedBundle(id, new ArrayList<>());
document.getStatementOrBundle().add(bundle);

QualifiedName id1 = cPF.newCpmQualifiedName("qN1");
Entity entity = cPF.getProvFactory().newEntity(id1);

QualifiedName id2 = cPF.newCpmQualifiedName("qN2");
Agent agent = cPF.getProvFactory().newAgent(id2);

Relation relation = cPF.getProvFactory().newWasAttributedTo(cPF.newCpmQualifiedName("attr"), id1, id2);

bundle.getStatement().add(entity, agent, relation);

CpmDocument doc = new CpmDocument(document, pF, cPF, cF);

Initialization from a List of Statements

This alternative approach allows for initialization using a list of statements and an explicit bundle identifier:

ProvFactory pF = new ProvFactory();
ICpmFactory cF = new CpmMergedFactory(pF);
ICpmProvFactory cPF = new CpmProvFactory(pF);

QualifiedName id1 = cPF.newCpmQualifiedName("qN1");
Entity entity = cPF.getProvFactory().newEntity(id1);

QualifiedName id2 = cPF.newCpmQualifiedName("qN2");
Agent agent = cPF.getProvFactory().newAgent(id2);

Relation relation = cPF.getProvFactory().newWasAttributedTo(cPF.newCpmQualifiedName("attr"), id1, id2);

QualifiedName bundleId = pF.newQualifiedName("uri", "bundle", "ex");

CpmDocument doc = new CpmDocument(List.of(entity, agent, relation), bundleId, pF, cPF, cF);

`ICpmFactory` Interface

The ICpmFactory interface defines how statements with identical identifiers are processed within the graph structure. The module provides the following core implementations:

Merged Implementation

CpmMergedFactory Merges statements with the same identifier using custom algorithms provided by ProvUtilities2.

Divided Implementations

These implementations retain all statements sharing the same identifier, differing in how they handle statement ordering:

CpmUnorderedFactory Does not preserve the original order of statements during conversions between CpmDocument and ProvToolBox Document.
CpmOrderedFactory Preserves the original statement order from the source ProvToolBox Document.

Traversing the Graph

The CpmDocument graph structure allows standard traversal algorithms. The following example demonstrates breadth-first search to extract a connected subgraph:

public List<INode> getConnectedSubgraph(CpmDocument cpmDoc, QualifiedName startNodeIdentifier) {
    List<INode> result = new ArrayList<>();
    Queue<INode> toProcess = new LinkedList<>();
    
    INode startNode = cpmDoc.getNode(startNodeIdentifier);
    
    toProcess.add(startNode);
    result.add(startNode);

    while (!toProcess.isEmpty()) {
        INode current = toProcess.poll();

        for (IEdge edge : current.getCauseEdges()) {
            toProcess.add(edge.getEffect());
            result.add(edge.getEffect());
        }
    }

    return result;
}

Functionalities Provided by `CpmDocument`

The CpmDocument class supports a wide range of operations, including:

Retrieving entities, agents, or activities by identifier.
Retrieving relations by identifier, or based on source and target identifiers.
Identifying the main activity.
Accessing forward and backward connectors.
Navigating preceding or successive connectors by identifier.
Extracting the traversal information subgraph of the bundle.
Extracting domain-specific provenance as a subgraph.
Identifying relations between traversal and domain-specific provenance parts.
Reconstructing a full document from traversal and domain-specific subgraphs, and cross-part relations.

Document Modification

The CpmDocument supports mutation through a set of defined operations:

Addition Use doAction methods to add statements.
Removal Use remove-prefixed methods to delete statements or nodes.
Modification Use methods prefixed with setNew or setCollectionMembers to update identifiers and collection memberships.

Customizing Traversal Strategy

The classification of nodes into traversal or domain-specific components is governed by the ITIStrategy interface. The default implementation relies on attributes of the underlying PROV elements.

To apply a custom strategy, implement ITIStrategy and register it with the document:

cpmDoc.setTIStrategy(customTiStrategy);

`cpm-template` Module Overview

The cpm-template module is designed to define and instantiate the template for creating a Document that encapsulates traversal information within the CPM framework.

Traversal information can be instantiated in two ways: from a JSON file or directly in-memory.

JSON-Based Instantiation

The structure of the required JSON file is defined by the template schema. This schema outlines all necessary properties and expected formats.

Example JSON template:

{
  "prefixes": {
    "ex": "www.example.org/"
  },
  "mainActivity": {
    "id": "ex:activity1",
    "startTime": "2011-11-16T16:05:00",
    "endTime": "2011-11-16T18:05:00",
    "used": [
      {
        "bcId": "ex:backConnector1"
      }
    ],
    "generated": [
      "ex:forwardConnector1"
    ]
  },
  "bundleName": "ex:bundle1",
  "backwardConnectors": [
    {
      "id": "ex:backConnector1"
    }
  ],
  "forwardConnectors": [
    {
      "id": "ex:forwardConnector1",
      "derivedFrom": [
        "ex:backConnector1"
      ]
    }
  ]
}

To construct a Document from a JSON InputStream, use the following code:

ITraversalInformationDeserializer deserializer = new TraversalInformationDeserializer();
Document doc = deserializer.deserializeDocument(inputStream);

In-Memory Instantiation

Alternatively, traversal information can be created programmatically and mapped to a Document directly in memory.

The following code reproduces the structure of the JSON example above:

DatatypeFactory datatypeFactory = DatatypeFactory.newInstance();
ProvFactory pF = new org.openprovenance.prov.vanilla.ProvFactory();
TraversalInformation ti = new TraversalInformation();

ti.setPrefixes(Map.of("ex", "www.example.com/"));
ti.setBundleName(ti.getNamespace().qualifiedName("ex", "bundle1", pF));

MainActivity mA = new MainActivity(ti.getNamespace().qualifiedName("ex", "activity1", pF));
mA.setStartTime(datatypeFactory.newXMLGregorianCalendar("2011-11-16T16:05:00"));
mA.setEndTime(datatypeFactory.newXMLGregorianCalendar("2011-11-16T18:05:00"));
ti.setMainActivity(mA);

QualifiedName bcID = ti.getNamespace().qualifiedName("ex", "backConnector1", pF);
BackwardConnector bC = new BackwardConnector(bcID);
ti.getBackwardConnectors().add(bC);

MainActivityUsed used = new MainActivityUsed(bcID);
mA.setUsed(List.of(used));

QualifiedName fcID = ti.getNamespace().qualifiedName("ex", "forwardConnector1", pF);
mA.setGenerated(List.of(fcID));

ForwardConnector fC = new ForwardConnector(fcID);
fC.setDerivedFrom(List.of(bC.getId()));
ti.getForwardConnectors().add(fC);

ITemplateProvMapper mapper = new TemplateProvMapper(new CpmProvFactory(pF));
Document doc = mapper.map(ti);

Agent Merging

By default, sender and receiver agents with the same identifier are treated as distinct. To enable automatic merging into a single agent with both types, configure the mapper as follows:

Using the constructor:

ITemplateProvMapper mapper = new TemplateProvMapper(new CpmProvFactory(pF), true);

Or using the setter:

mapper.setMergeAgents(true);

The mapper can be passed to the TraversalInformationDeserializer to merge agents during JSON deserialisation as well.

MMCI Dataset

The CPM files derived from the MMCI XML test data are available in PROV-N format here. For each sample in the original XML data, two bundles are generated—acquisition and storage—and are stored in their respective directories.

To obtain the acquisition bundle in a different serialization format, rerun the CpmMouTest with the OUTPUT_FORMAT_EXTENSION variable set to the desired format.

Note: These files are not directly compatible with Provenance Storage (reference). To enable compatibility, they must be transformed in a manner similar to the EMBRC dataset. For guidance, refer to the transformation logic implemented in these classes and the test methods transformCpmToProvStorageFormatV0 and transformCpmToProvStorageFormatV1 in the CpmEmbrcTest class.

EMBRC Dataset

The CPM representations of the four datasets developed by EMBRC (source) are available in multiple formats here. Each subdirectory corresponds to one dataset and includes the following:

Original provenance files (suffixed with _ProvenanceMetadata)
Transformed PROV JSON-LD files (suffixed with _transformed)
CPM format files (suffixed with _cpm)
Provenance Storage-compatible files (suffixed with _storage_v0 and _storage_v1) (see reference)

Uploading to a Custom Provenance Storage Deployment

To upload these files to a Provenance Storage instance:

Upload all _storage_v0 files in order (1 to 4).
Then upload all _storage_v1 files in order.

Make sure the host component and organization identifier in the namespace IRIs matches your target Provenance Storage instance. If you're targeting a custom instance, update the IRIs in the ProvStorageNamespaceConstants class and rerun the CpmEmbrcTest to regenerate the files accordingly.

When uploading files via a custom script, ensure the files are not altered in any way, as this will affect their content hash and prevent successful upload. For example, when reading the files in Python, use the following to preserve encoding and newline consistency:

open(filename, encoding="utf-8", newline="")

Contributing

Thank you for your interest in contributing!

Setup

Fork the repository.
Clone your fork and create a feature branch:
```
git checkout -b feat/your-feature
```
Build and test:
```
mvn clean verify
```

Commit Guidelines

Use Conventional Commits. Examples:

feat(core): add support for X
fix(template): correct behavior of Y
docs: README commit guideline

Pull Requests

Target the main branch.
Ensure all tests pass.
Include relevant tests and documentation.

By contributing, you agree that your contributions will be licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.github/workflows		.github/workflows
.idea		.idea
cpm-core		cpm-core
cpm-template		cpm-template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reference Implementation of the Common Provenance Model

Status

Requirements

Installation

cpm-core

cpm-template

`cpm-core` Module Overview

Initialization of `CpmDocument`

Initialization from a ProvToolBox `Document`

Initialization from a List of Statements

`ICpmFactory` Interface

Merged Implementation

Divided Implementations

Traversing the Graph

Functionalities Provided by `CpmDocument`

Document Modification

Customizing Traversal Strategy

`cpm-template` Module Overview

JSON-Based Instantiation

In-Memory Instantiation

Agent Merging

MMCI Dataset

EMBRC Dataset

Uploading to a Custom Provenance Storage Deployment

Contributing

Setup

Commit Guidelines

Pull Requests

About

Uh oh!

Releases 1

Uh oh!

Languages

License

dwwop/cpm

Folders and files

Latest commit

History

Repository files navigation

Reference Implementation of the Common Provenance Model

Status

Requirements

Installation

cpm-core

cpm-template

cpm-core Module Overview

Initialization of CpmDocument

Initialization from a ProvToolBox Document

Initialization from a List of Statements

ICpmFactory Interface

Merged Implementation

Divided Implementations

Traversing the Graph

Functionalities Provided by CpmDocument

Document Modification

Customizing Traversal Strategy

cpm-template Module Overview

JSON-Based Instantiation

In-Memory Instantiation

Agent Merging

MMCI Dataset

EMBRC Dataset

Uploading to a Custom Provenance Storage Deployment

Contributing

Setup

Commit Guidelines

Pull Requests

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Languages

`cpm-core` Module Overview

Initialization of `CpmDocument`

Initialization from a ProvToolBox `Document`

`ICpmFactory` Interface

Functionalities Provided by `CpmDocument`

`cpm-template` Module Overview