Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 54 additions & 1 deletion tree/ntuple/doc/BinaryFormatSpecification.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# RNTuple Binary Format Specification 1.0.0.2
# RNTuple Binary Format Specification 1.0.1.0

## Versioning Notes

Expand Down Expand Up @@ -627,6 +627,7 @@ The footer envelope has the following structure:
- Header checksum (XxHash-3 64bit)
- Schema extension record frame
- List frame of cluster group record frames
- List frame of linked attribute set record frames

The header checksum can be used to cross-check that header and footer belong together.
The meaning of the feature flags is the same as for the header.
Expand Down Expand Up @@ -799,6 +800,58 @@ In every cluster, every field has exactly one primary column representation.
All other representations must be suppressed.
Note that the primary column representation can change from cluster to cluster.

## Linked Attribute Sets

An RNTuple may have zero or more linked Attribute Sets, containing metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An RNTuple may have zero or more linked Attribute Sets, containing metadata.
An RNTuple may have zero or more linked Attribute Sets, containing user-defined metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessarily true, as some metadata may be automatically defined (e.g. ROOT's internal attributes). It is anyway not relevant for the purposes of specification

Each Attribute Set is stored on disk as an RNTuple and the anchor of each RNTuple is linked to by the main
RNTuple's footer.

An Attribute Set RNTuple has a number of restrictions compared to a regular RNTuple:

1. it cannot have linked Attribute RNTuples itself;
2. the alias columns sections, both in its header and footer, must be empty (i.e. none of the Attribute Set RNTuple's
fields can be projected fields);
3. none of its fields may have a structural role of 0x04 (i.e. it must not contain a ROOT streamer object);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a choice or a technical limitation?

Copy link
Contributor Author

@silverweed silverweed Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a choice (see other answer)


An attribute set record frame has the following contents:
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Schema Version Major | Schema Version Minor |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attribute Anchor Uncompressed Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```

- The first 32 bits contain the _Attribute Schema Version_. This is split into a _Major_ (16 LSB) and a
_Minor_ (16 MSB) version. The Schema Version is described below;
- a 32-bit unsigned integer follows, containing the uncompressed size of the Attribute Anchor.

These fields are followed by:

- a locator pointing to the Attribute RNTuple's anchor;
- a string containing the Attribute Set's name. All linked Attribute Sets must have a non-empty, distinct name.

### Attribute Schema Version
Each Attribute Set is created with a user-defined model. This model is not used directly by the underlying Attribute
Set RNTuple, but it is augmented with internal fields used to store additional data that serve to associate each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When inspecting the Attribute Set RNTuple will the additional fields be exposed or hidden? (And If they are exposed would a user be able to distinguish the implicit vs the explicit part of the model?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will be hidden: the user doesn't have access to the inner model but only to the fields they defined on the user model.

entry in the Attribute Set with those in the main RNTuple.

The Attribute Schema Version describes the internal schema of the linked Attribute Set RNTuple.
A change in Major version number indicates a breaking, non-forward-compatible change in the schema: readers should
refuse reading an Attribute Set whose Major Schema Version is unknown.
A change in Minor version number indicates the presence of optional additional fields in the schema: readers should
still be able to read the Attribute Set as before, ignoring any new field.

The current Attribute Schema Version is **1.0**. It has the following fields (in the following order):
1. `_rangeStart` (type `std::uint64_t`): the start of the range that each Attribute Entry refers to;
2. `_rangeLen` (type `std::uint64_t`): the length of the range that each Attribute Entry refers to.
Note that `_rangeLen == 0` is valid and refers to an empty range;
3. `_userModel` (untyped record): a record-type field that serves as the root field to the user-provided RNTupleModel
used by the Attribute Set RNTuple. Each user-defined field that was attached to the Field Zero in the user-provided
Model will be attached to this field in the Attribute Set RNTuple.

## Mapping of C++ Types to Fields and Columns

This section is a comprehensive list of the C++ types with RNTuple I/O support.
Expand Down
142 changes: 142 additions & 0 deletions tree/ntuple/inc/ROOT/RNTupleDescriptor.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,53 @@ struct RNTupleClusterBoundaries {
std::vector<ROOT::Internal::RNTupleClusterBoundaries> GetClusterBoundaries(const RNTupleDescriptor &desc);
} // namespace Internal

namespace Experimental {
namespace Internal {
class RNTupleAttrSetDescriptorBuilder;
}

// clang-format off
/**
\class ROOT::Experimental::RNTupleAttrSetDescriptor
\ingroup NTuple
\brief Metadata stored for every Attribute Set linked to an RNTuple.
*/
// clang-format on
class RNTupleAttrSetDescriptor final {
friend class Experimental::Internal::RNTupleAttrSetDescriptorBuilder;

std::uint16_t fSchemaVersionMajor = 0;
std::uint16_t fSchemaVersionMinor = 0;
std::uint32_t fAnchorLength = 0; ///< uncompressed size of the linked anchor
// The locator of the AttributeSet anchor.
// In case of kTypeFile, it points to the beginning of the Anchor's payload.
// NOTE: Only kTypeFile is supported at the moment.
RNTupleLocator fAnchorLocator;
std::string fName;

public:
RNTupleAttrSetDescriptor() = default;
RNTupleAttrSetDescriptor(const RNTupleAttrSetDescriptor &other) = delete;
RNTupleAttrSetDescriptor &operator=(const RNTupleAttrSetDescriptor &other) = delete;
RNTupleAttrSetDescriptor(RNTupleAttrSetDescriptor &&other) = default;
RNTupleAttrSetDescriptor &operator=(RNTupleAttrSetDescriptor &&other) = default;

bool operator==(const RNTupleAttrSetDescriptor &other) const;
bool operator!=(const RNTupleAttrSetDescriptor &other) const { return !(*this == other); }

const std::string &GetName() const { return fName; }
std::uint16_t GetSchemaVersionMajor() const { return fSchemaVersionMajor; }
std::uint16_t GetSchemaVersionMinor() const { return fSchemaVersionMinor; }
std::uint32_t GetAnchorLength() const { return fAnchorLength; }
const RNTupleLocator &GetAnchorLocator() const { return fAnchorLocator; }

RNTupleAttrSetDescriptor Clone() const;
};

class RNTupleAttrSetDescriptorIterable;

} // namespace Experimental

// clang-format off
/**
\class ROOT::RFieldDescriptor
Expand Down Expand Up @@ -697,6 +744,8 @@ private:
std::vector<ROOT::DescriptorId_t> fSortedClusterGroupIds;
/// Potentially a subset of all the available clusters
std::unordered_map<ROOT::DescriptorId_t, RClusterDescriptor> fClusterDescriptors;
/// List of AttributeSets linked to this RNTuple
std::vector<Experimental::RNTupleAttrSetDescriptor> fAttributeSets;

// We don't expose this publicly because when we add sharded clusters, this interface does not make sense anymore
ROOT::DescriptorId_t FindClusterId(ROOT::NTupleSize_t entryIdx) const;
Expand All @@ -714,6 +763,7 @@ public:
class RClusterGroupDescriptorIterable;
class RClusterDescriptorIterable;
class RExtraTypeInfoDescriptorIterable;
friend class Experimental::RNTupleAttrSetDescriptorIterable;

/// Modifiers passed to CreateModel()
struct RCreateModelOptions {
Expand Down Expand Up @@ -802,6 +852,8 @@ public:

RExtraTypeInfoDescriptorIterable GetExtraTypeInfoIterable() const;

ROOT::Experimental::RNTupleAttrSetDescriptorIterable GetAttrSetIterable() const;

const std::string &GetName() const { return fName; }
const std::string &GetDescription() const { return fDescription; }

Expand All @@ -812,6 +864,7 @@ public:
std::size_t GetNClusters() const { return fNClusters; }
std::size_t GetNActiveClusters() const { return fClusterDescriptors.size(); }
std::size_t GetNExtraTypeInfos() const { return fExtraTypeInfoDescriptors.size(); }
std::size_t GetNAttributeSets() const { return fAttributeSets.size(); }

/// We know the number of entries from adding the cluster summaries
ROOT::NTupleSize_t GetNEntries() const { return fNEntries; }
Expand Down Expand Up @@ -1141,6 +1194,59 @@ public:
RIterator end() { return RIterator(fNTuple.fExtraTypeInfoDescriptors.cend()); }
};

namespace Experimental {
// clang-format off
/**
\class ROOT::Experimental::RNTupleAttrSetDescriptorIterable
\ingroup NTuple
\brief Used to loop over all the Attribute Sets linked to an RNTuple
*/
// clang-format on
// TODO: move this to RNTupleDescriptor::RNTupleAttrSetDescriptorIterable when it moves out of Experimental.
class RNTupleAttrSetDescriptorIterable final {
private:
/// The associated RNTuple for this range.
const RNTupleDescriptor &fNTuple;

public:
class RIterator final {
private:
using Iter_t = std::vector<RNTupleAttrSetDescriptor>::const_iterator;
/// The wrapped vector iterator
Iter_t fIter;

public:
using iterator_category = std::forward_iterator_tag;
using iterator = RIterator;
using value_type = RNTupleAttrSetDescriptor;
using difference_type = std::ptrdiff_t;
using pointer = const value_type *;
using reference = const value_type &;

RIterator(Iter_t iter) : fIter(iter) {}
iterator &operator++() /* prefix */
{
++fIter;
return *this;
}
iterator operator++(int) /* postfix */
{
auto old = *this;
operator++();
return old;
}
reference operator*() const { return *fIter; }
pointer operator->() const { return &*fIter; }
bool operator!=(const iterator &rh) const { return fIter != rh.fIter; }
bool operator==(const iterator &rh) const { return fIter == rh.fIter; }
};

RNTupleAttrSetDescriptorIterable(const RNTupleDescriptor &ntuple) : fNTuple(ntuple) {}
RIterator begin() { return RIterator(fNTuple.fAttributeSets.cbegin()); }
RIterator end() { return RIterator(fNTuple.fAttributeSets.cend()); }
};
} // namespace Experimental

// clang-format off
/**
\class ROOT::RNTupleDescriptor::RHeaderExtension
Expand Down Expand Up @@ -1214,6 +1320,39 @@ public:
}
};

namespace Experimental::Internal {
class RNTupleAttrSetDescriptorBuilder final {
ROOT::Experimental::RNTupleAttrSetDescriptor fDesc;

public:
RNTupleAttrSetDescriptorBuilder &Name(std::string_view name)
{
fDesc.fName = name;
return *this;
}
RNTupleAttrSetDescriptorBuilder &SchemaVersion(std::uint16_t major, std::uint16_t minor)
{
fDesc.fSchemaVersionMajor = major;
fDesc.fSchemaVersionMinor = minor;
return *this;
}
RNTupleAttrSetDescriptorBuilder &AnchorLocator(const RNTupleLocator &loc)
{
fDesc.fAnchorLocator = loc;
return *this;
}
RNTupleAttrSetDescriptorBuilder &AnchorLength(std::uint32_t length)
{
fDesc.fAnchorLength = length;
return *this;
}

/// Attempt to make an AttributeSet descriptor. This may fail if the builder
/// was not given enough information to make a proper descriptor.
RResult<ROOT::Experimental::RNTupleAttrSetDescriptor> MoveDescriptor();
};
} // namespace Experimental::Internal

namespace Internal {

// clang-format off
Expand Down Expand Up @@ -1597,6 +1736,8 @@ public:
RResult<void> AddExtraTypeInfo(RExtraTypeInfoDescriptor &&extraTypeInfoDesc);
void ReplaceExtraTypeInfo(RExtraTypeInfoDescriptor &&extraTypeInfoDesc);

RResult<void> AddAttributeSet(Experimental::RNTupleAttrSetDescriptor &&attrSetDesc);

/// Mark the beginning of the header extension; any fields and columns added after a call to this function are
/// annotated as begin part of the header extension.
void BeginHeaderExtension();
Expand Down Expand Up @@ -1630,6 +1771,7 @@ inline RNTupleDescriptor CloneDescriptorSchema(const RNTupleDescriptor &desc)
}

} // namespace Internal

} // namespace ROOT

#endif // ROOT_RNTupleDescriptor
13 changes: 13 additions & 0 deletions tree/ntuple/inc/ROOT/RNTupleSerialize.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,13 @@ class RNTupleDescriptor;
class RClusterDescriptor;
enum class EExtraTypeInfoIds;

namespace Experimental {
class RNTupleAttrSetDescriptor;
namespace Internal {
class RNTupleAttrSetDescriptorBuilder;
} // namespace Internal
} // namespace Experimental

namespace Internal {

class RClusterDescriptorBuilder;
Expand Down Expand Up @@ -271,6 +278,12 @@ public:
static RResult<std::uint32_t> DeserializeSchemaDescription(const void *buffer, std::uint64_t bufSize,
ROOT::Internal::RNTupleDescriptorBuilder &descBuilder);

static RResult<std::uint32_t>
SerializeAttributeSet(const Experimental::RNTupleAttrSetDescriptor &attrSetDesc, void *buffer);
static RResult<std::uint32_t>
DeserializeAttributeSet(const void *buffer, std::uint64_t bufSize,
Experimental::Internal::RNTupleAttrSetDescriptorBuilder &attrSetDescBld);

static RResult<RContext> SerializeHeader(void *buffer, const RNTupleDescriptor &desc);
static RResult<std::uint32_t> SerializePageList(void *buffer, const RNTupleDescriptor &desc,
std::span<ROOT::DescriptorId_t> physClusterIDs,
Expand Down
51 changes: 51 additions & 0 deletions tree/ntuple/src/RNTupleDescriptor.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -785,6 +785,8 @@ ROOT::RNTupleDescriptor ROOT::RNTupleDescriptor::Clone() const
clone.fSortedClusterGroupIds = fSortedClusterGroupIds;
for (const auto &d : fClusterDescriptors)
clone.fClusterDescriptors.emplace(d.first, d.second.Clone());
for (const auto &d : fAttributeSets)
clone.fAttributeSets.emplace_back(d.Clone());
return clone;
}

Expand Down Expand Up @@ -1105,6 +1107,19 @@ void ROOT::Internal::RNTupleDescriptorBuilder::SetFeature(unsigned int flag)
fDescriptor.fFeatureFlags.insert(flag);
}

ROOT::RResult<ROOT::Experimental::RNTupleAttrSetDescriptor>
ROOT::Experimental::Internal::RNTupleAttrSetDescriptorBuilder::MoveDescriptor()
{
if (fDesc.fName.empty())
return R__FAIL("attribute set name cannot be empty");
if (fDesc.fAnchorLength == 0)
return R__FAIL("invalid anchor length");
if (fDesc.fAnchorLocator.GetType() == RNTupleLocator::kTypeUnknown)
return R__FAIL("invalid locator type");

return std::move(fDesc);
}

ROOT::RResult<ROOT::RColumnDescriptor> ROOT::Internal::RColumnDescriptorBuilder::MakeDescriptor() const
{
if (fColumn.GetLogicalId() == ROOT::kInvalidDescriptorId)
Expand Down Expand Up @@ -1359,6 +1374,19 @@ void ROOT::Internal::RNTupleDescriptorBuilder::ReplaceExtraTypeInfo(RExtraTypeIn
fDescriptor.fExtraTypeInfoDescriptors.emplace_back(std::move(extraTypeInfoDesc));
}

ROOT::RResult<void>
ROOT::Internal::RNTupleDescriptorBuilder::AddAttributeSet(Experimental::RNTupleAttrSetDescriptor &&attrSetDesc)
{
auto &attrSets = fDescriptor.fAttributeSets;
if (std::find_if(attrSets.begin(), attrSets.end(), [&name = attrSetDesc.GetName()](const auto &desc) {
return desc.GetName() == name;
}) != attrSets.end()) {
return R__FAIL("attribute sets with duplicate names");
}
attrSets.push_back(std::move(attrSetDesc));
return RResult<void>::Success();
}

RNTupleSerializer::StreamerInfoMap_t ROOT::Internal::RNTupleDescriptorBuilder::BuildStreamerInfos() const
{
RNTupleSerializer::StreamerInfoMap_t streamerInfoMap;
Expand Down Expand Up @@ -1474,3 +1502,26 @@ ROOT::RNTupleDescriptor::RExtraTypeInfoDescriptorIterable ROOT::RNTupleDescripto
{
return RExtraTypeInfoDescriptorIterable(*this);
}

ROOT::Experimental::RNTupleAttrSetDescriptorIterable ROOT::RNTupleDescriptor::GetAttrSetIterable() const
{
return Experimental::RNTupleAttrSetDescriptorIterable(*this);
}

bool ROOT::Experimental::RNTupleAttrSetDescriptor::operator==(const RNTupleAttrSetDescriptor &other) const
{
return fAnchorLength == other.fAnchorLength && fSchemaVersionMajor == other.fSchemaVersionMajor &&
fSchemaVersionMinor == other.fSchemaVersionMinor && fAnchorLocator == other.fAnchorLocator &&
fName == other.fName;
};

ROOT::Experimental::RNTupleAttrSetDescriptor ROOT::Experimental::RNTupleAttrSetDescriptor::Clone() const
{
RNTupleAttrSetDescriptor desc;
desc.fAnchorLength = fAnchorLength;
desc.fSchemaVersionMajor = fSchemaVersionMajor;
desc.fSchemaVersionMinor = fSchemaVersionMinor;
desc.fAnchorLocator = fAnchorLocator;
desc.fName = fName;
return desc;
}
Loading
Loading