This library, originally developed as part of a master's thesis (available here), is a reference implementation of the Common Provenance Model (CPM) from the ISO 23494 standard. It reflects the model's state as of Spring 2025 (reference) and is implemented as an extension of the ProvToolbox library.
- Java 23
- Maven 3.9.9
Since this library is not yet published to Maven Central, download the latest .jar
files
from Releases and add them to your project manually.
- Download the
cpm-core-1.0.0.jar
from Releases. - Place the
cpm-core-1.0.0.jar
in a directory inside your project (e.g.,src/main/resources
). - Add the following plugin to your
pom.xml
:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-install-plugin</artifactId>
<version>3.1.4</version>
<executions>
<execution>
<id>install-core</id>
<phase>clean</phase>
<configuration>
<file>${project.basedir}/src/main/resources/cpm-core-1.0.0.jar</file>
<groupId>cz.muni.fi.cpm</groupId>
<artifactId>cpm-core</artifactId>
<version>1.0.0</version>
<packaging>jar</packaging>
<generatePom>true</generatePom>
</configuration>
<goals>
<goal>install-file</goal>
</goals>
</execution>
</executions>
</plugin>
- Run
mvn clean
- Add the following dependencies to your
pom.xml
:
<dependency>
<groupId>org.openprovenance.prov</groupId>
<artifactId>prov-model</artifactId>
<version>2.2.1</version>
</dependency>
<dependency>
<groupId>cz.muni.fi.cpm</groupId>
<artifactId>cpm-core</artifactId>
<version>1.0.0</version>
</dependency>
Important:
cpm-template
usescpm-core
as its dependency!
- Download the
cpm-template-1.0.0.jar
from Releases. - Place the
cpm-template-1.0.0.jar
in the same directory inside your project. - Add the following execution to the
maven-install-plugin
plugin to yourpom.xml
:
<execution>
<id>install-template</id>
<phase>clean</phase>
<configuration>
<file>${project.basedir}/src/main/resources/cpm-template-1.0.0.jar</file>
<groupId>cz.muni.fi.cpm</groupId>
<artifactId>cpm-template</artifactId>
<version>1.0.0</version>
<packaging>jar</packaging>
<generatePom>true</generatePom>
</configuration>
<goals>
<goal>install-file</goal>
</goals>
</execution>
- Run
mvn clean
- Add the following dependencies to your
pom.xml
:
<dependency>
<groupId>org.openprovenance.prov</groupId>
<artifactId>prov-interop</artifactId>
<version>2.2.1</version>
</dependency>
<dependency>
<groupId>org.openprovenance.prov</groupId>
<artifactId>prov-nf</artifactId>
<version>2.2.1</version>
</dependency>
<dependency>
<groupId>cz.muni.fi.cpm</groupId>
<artifactId>cpm-template</artifactId>
<version>1.0.0</version>
</dependency>
The cpm-core
module centers around the CpmDocument
class, which enables graph-based traversal and querying of provenance documents. Internally, the document is represented as a traversable graph composed of nodes and edges.
The CpmDocument
class can be initialized in multiple ways. The two most commonly used approaches are:
This method is suitable when working with a ProvToolBox Document
containing exactly one bundle:
ProvFactory pF = new ProvFactory();
ICpmFactory cF = new CpmMergedFactory(pF);
ICpmProvFactory cPF = new CpmProvFactory(pF);
Document document = pF.newDocument();
document.setNamespace(cPF.newCpmNamespace());
QualifiedName id = pF.newQualifiedName("uri", "bundle", "ex");
Bundle bundle = pF.newNamedBundle(id, new ArrayList<>());
document.getStatementOrBundle().add(bundle);
QualifiedName id1 = cPF.newCpmQualifiedName("qN1");
Entity entity = cPF.getProvFactory().newEntity(id1);
QualifiedName id2 = cPF.newCpmQualifiedName("qN2");
Agent agent = cPF.getProvFactory().newAgent(id2);
Relation relation = cPF.getProvFactory().newWasAttributedTo(cPF.newCpmQualifiedName("attr"), id1, id2);
bundle.getStatement().add(entity, agent, relation);
CpmDocument doc = new CpmDocument(document, pF, cPF, cF);
This alternative approach allows for initialization using a list of statements and an explicit bundle identifier:
ProvFactory pF = new ProvFactory();
ICpmFactory cF = new CpmMergedFactory(pF);
ICpmProvFactory cPF = new CpmProvFactory(pF);
QualifiedName id1 = cPF.newCpmQualifiedName("qN1");
Entity entity = cPF.getProvFactory().newEntity(id1);
QualifiedName id2 = cPF.newCpmQualifiedName("qN2");
Agent agent = cPF.getProvFactory().newAgent(id2);
Relation relation = cPF.getProvFactory().newWasAttributedTo(cPF.newCpmQualifiedName("attr"), id1, id2);
QualifiedName bundleId = pF.newQualifiedName("uri", "bundle", "ex");
CpmDocument doc = new CpmDocument(List.of(entity, agent, relation), bundleId, pF, cPF, cF);
The ICpmFactory
interface defines how statements with identical identifiers are processed within the graph structure. The module provides the following core implementations:
CpmMergedFactory
Merges statements with the same identifier using custom algorithms provided byProvUtilities2
.
These implementations retain all statements sharing the same identifier, differing in how they handle statement ordering:
-
CpmUnorderedFactory
Does not preserve the original order of statements during conversions betweenCpmDocument
and ProvToolBoxDocument
. -
CpmOrderedFactory
Preserves the original statement order from the source ProvToolBoxDocument
.
The CpmDocument
graph structure allows standard traversal algorithms. The following example demonstrates breadth-first search to extract a connected subgraph:
public List<INode> getConnectedSubgraph(CpmDocument cpmDoc, QualifiedName startNodeIdentifier) {
List<INode> result = new ArrayList<>();
Queue<INode> toProcess = new LinkedList<>();
INode startNode = cpmDoc.getNode(startNodeIdentifier);
toProcess.add(startNode);
result.add(startNode);
while (!toProcess.isEmpty()) {
INode current = toProcess.poll();
for (IEdge edge : current.getCauseEdges()) {
toProcess.add(edge.getEffect());
result.add(edge.getEffect());
}
}
return result;
}
The CpmDocument
class supports a wide range of operations, including:
- Retrieving entities, agents, or activities by identifier.
- Retrieving relations by identifier, or based on source and target identifiers.
- Identifying the main activity.
- Accessing forward and backward connectors.
- Navigating preceding or successive connectors by identifier.
- Extracting the traversal information subgraph of the bundle.
- Extracting domain-specific provenance as a subgraph.
- Identifying relations between traversal and domain-specific provenance parts.
- Reconstructing a full document from traversal and domain-specific subgraphs, and cross-part relations.
The CpmDocument
supports mutation through a set of defined operations:
-
Addition Use
doAction
methods to add statements. -
Removal Use
remove
-prefixed methods to delete statements or nodes. -
Modification Use methods prefixed with
setNew
orsetCollectionMembers
to update identifiers and collection memberships.
The classification of nodes into traversal or domain-specific components is governed by the ITIStrategy
interface. The default implementation relies on attributes of the underlying PROV elements.
To apply a custom strategy, implement ITIStrategy
and register it with the document:
cpmDoc.setTIStrategy(customTiStrategy);
The cpm-template
module is designed to define and instantiate the template for creating a Document
that encapsulates traversal information within the CPM framework.
Traversal information can be instantiated in two ways: from a JSON file or directly in-memory.
The structure of the required JSON file is defined by the template schema. This schema outlines all necessary properties and expected formats.
Example JSON template:
{
"prefixes": {
"ex": "www.example.org/"
},
"mainActivity": {
"id": "ex:activity1",
"startTime": "2011-11-16T16:05:00",
"endTime": "2011-11-16T18:05:00",
"used": [
{
"bcId": "ex:backConnector1"
}
],
"generated": [
"ex:forwardConnector1"
]
},
"bundleName": "ex:bundle1",
"backwardConnectors": [
{
"id": "ex:backConnector1"
}
],
"forwardConnectors": [
{
"id": "ex:forwardConnector1",
"derivedFrom": [
"ex:backConnector1"
]
}
]
}
To construct a Document
from a JSON InputStream
, use the following code:
ITraversalInformationDeserializer deserializer = new TraversalInformationDeserializer();
Document doc = deserializer.deserializeDocument(inputStream);
Alternatively, traversal information can be created programmatically and mapped to a Document
directly in memory.
The following code reproduces the structure of the JSON example above:
DatatypeFactory datatypeFactory = DatatypeFactory.newInstance();
ProvFactory pF = new org.openprovenance.prov.vanilla.ProvFactory();
TraversalInformation ti = new TraversalInformation();
ti.setPrefixes(Map.of("ex", "www.example.com/"));
ti.setBundleName(ti.getNamespace().qualifiedName("ex", "bundle1", pF));
MainActivity mA = new MainActivity(ti.getNamespace().qualifiedName("ex", "activity1", pF));
mA.setStartTime(datatypeFactory.newXMLGregorianCalendar("2011-11-16T16:05:00"));
mA.setEndTime(datatypeFactory.newXMLGregorianCalendar("2011-11-16T18:05:00"));
ti.setMainActivity(mA);
QualifiedName bcID = ti.getNamespace().qualifiedName("ex", "backConnector1", pF);
BackwardConnector bC = new BackwardConnector(bcID);
ti.getBackwardConnectors().add(bC);
MainActivityUsed used = new MainActivityUsed(bcID);
mA.setUsed(List.of(used));
QualifiedName fcID = ti.getNamespace().qualifiedName("ex", "forwardConnector1", pF);
mA.setGenerated(List.of(fcID));
ForwardConnector fC = new ForwardConnector(fcID);
fC.setDerivedFrom(List.of(bC.getId()));
ti.getForwardConnectors().add(fC);
ITemplateProvMapper mapper = new TemplateProvMapper(new CpmProvFactory(pF));
Document doc = mapper.map(ti);
By default, sender and receiver agents with the same identifier are treated as distinct. To enable automatic merging into a single agent with both types, configure the mapper as follows:
Using the constructor:
ITemplateProvMapper mapper = new TemplateProvMapper(new CpmProvFactory(pF), true);
Or using the setter:
mapper.setMergeAgents(true);
The mapper can be passed to the TraversalInformationDeserializer
to merge agents during JSON deserialisation as well.
The CPM files derived from the MMCI XML test data are available in PROV-N format here. For each sample in the original XML data, two bundles are generated—acquisition and storage—and are stored in their respective directories.
To obtain the acquisition bundle in a different serialization format, rerun the CpmMouTest with the OUTPUT_FORMAT_EXTENSION
variable set to the desired format.
Note: These files are not directly compatible with Provenance Storage (reference). To enable compatibility, they must be transformed in a manner similar to the EMBRC dataset. For guidance, refer to the transformation logic implemented in these classes and the test methods
transformCpmToProvStorageFormatV0
andtransformCpmToProvStorageFormatV1
in the CpmEmbrcTest class.
The CPM representations of the four datasets developed by EMBRC (source) are available in multiple formats here. Each subdirectory corresponds to one dataset and includes the following:
- Original provenance files (suffixed with
_ProvenanceMetadata
) - Transformed PROV JSON-LD files (suffixed with
_transformed
) - CPM format files (suffixed with
_cpm
) - Provenance Storage-compatible files (suffixed with
_storage_v0
and_storage_v1
) (see reference)
To upload these files to a Provenance Storage instance:
- Upload all
_storage_v0
files in order (1 to 4). - Then upload all
_storage_v1
files in order.
Make sure the host component and organization identifier in the namespace IRIs matches your target Provenance Storage instance. If you're targeting a custom instance, update the IRIs in the ProvStorageNamespaceConstants class and rerun the CpmEmbrcTest to regenerate the files accordingly.
When uploading files via a custom script, ensure the files are not altered in any way, as this will affect their content hash and prevent successful upload. For example, when reading the files in Python, use the following to preserve encoding and newline consistency:
open(filename, encoding="utf-8", newline="")
Thank you for your interest in contributing!
-
Fork the repository.
-
Clone your fork and create a feature branch:
git checkout -b feat/your-feature
-
Build and test:
mvn clean verify
Use Conventional Commits. Examples:
feat(core): add support for X
fix(template): correct behavior of Y
docs: README commit guideline
- Target the
main
branch. - Ensure all tests pass.
- Include relevant tests and documentation.
By contributing, you agree that your contributions will be licensed under the Apache 2.0 License.