This code was produced as part of a Fermilab LDRD project studying the use of HDF5 for HEP event data.
This repository includes utilities for resource and error management of HDF5 entities, and a facility (Ntuple) for persistence of simple tabular data in HDF5 files with row-wise fill semantics and column-wise read.
- Linux (tested on RHEL6/7-related, should work on Ubuntu 16.04) or Mac (tested on Sierra).
- CMake.
- C++17.
- A modern HDF5 distribution (1.10.0+).
- Optionally, MPI distribution (tested with MPICH, likely to be problematic with OpenMPI).
- For reading the data so produced, there are Python and R packages which should be very straightforward to use (e.g. h5py, pandas, h5). See notes below with regard to package compatibility, however.
N.B. due to ABI compatibility restrictions with respect to both C++ and Fortran >= 90, all dependencies using either of these languages (e.g. HDF5 and MPI with most common configurations) must have been compiled with the same compiler. Pure C and/or Fortan 77 dependencies do not suffer this restriction. Packages used for data reading will have no link dependency on the compiled code herein, so you are not constrained to use the same compiler (for instance). However your data reading packages will have to be consistent within themselves in their use of (e.g.) compiler, HDF5, MPI, etc.
The system has been tested with a native installation on Sierra. The following describes getting everything up through h5py, including the optional MPI for parallel I/O with HDF5.
- Python from Homebrew.
- MPI from Homebrew, specifically
mpich. OpenMPI causes failures in some uses of parallel I/O inh5py. - HDF5 1.10+, see below.
mpi4py,numpy, andsix, installed usingpip.h5pyinstalled from the installation tarball, after runningpython setup.py configure --mpito force creation of the MPI-aware version ofh5py. Obtain installation tarballs from The h5py project on PyPi.
If you have open-mpi installed, to remove it and build mpich
instead, use
brew rm open-mpibrew cleanupbrew install mpichpip uninstall mpi4pypip install --no-binary :all: mpi4pybrew uninstall hdf5and re-install per below.
In order to obtain a reasonable build of HDF5 1.10, one should:
brew install --build-from-source hdf5 --with-mpi --without-cxx
We (Fermilab) use a system called UPS (Unix Product System) for environment-based setup and use of interdependent packages of known version and variant. The following packages are available for SLF6 (RHEL6-based), SL7 (RHEL7-based) and Ubuntu 16.04 on https://scisoft.fnal.gov/:
- hep_hpc (this package)
- GCC 6.3.0
- HDF5 1.10.1 (with and without MPI)
- Python 2.7.13
- mpi4py 2.0.0, numpy1.12.1, six 1.10.0 and h5py 2.7.0
If you are interested in details, please contact us (see below). We are also in a position to provide binaries for SLF6 and Ubuntu 16.04 on request using this system.
Use your OS' package manager wherever possible.
-
One time per local repository only:
git submodule init -
After synchronizing with upstream (including after call to first submodule init):
git submodule update gtest -
Make a build directory and
cdinto it. -
Invoke CMake to configure the code:
CC=<c-compiler> CXX=<c++-compiler> FC=<Fortran-compiler> \ cmake -DCMAKE_BUILD_TYPE=<Debug|Release|RelWithDebInfo> \ -DCMAKE_INSTALL_PREFIX=<install-area> \ [-DCMAKE_CXX_STANDARD=17] \ [-DWANT_MPI=TRUE [-DMPIEXEC_PREFLAGS=...]] \ [-DWANT_UPS=TRUE] \ [-DWANT_H5PY=TRUE] \ <path-to-repository-top-dir>The
CMakeLists.txtfile includes a safeguard against invoking CMake from within the source directory, but you may still have to remove some debris if you do this unintentionally. DefineWANT_MPIappropriately to activate MPI, if it is available and desired. Note that your own code may still use MPI even ifWANT_MPIis not set, but the (as yet, very basic) MPI facilities of this package will not be available. If the executable for running MPI jobs on your system is "srun," then MPI tests will be disabled unless you defined MPIEXEC_PREFLAGS (e.g. to set the hardware type, time limits, etc.). DefineWANT_UPSif you wish to build a UPS-capable package. DefiningWANT_H5PYturns on the testing of theconcat-h5pyutility (assuming theh5pyPython package is available and compatible with the current compiler, etc.). -
Build the code:
make [-j #] -
Run the tests:
ctest [-j #] -
Install the code for use:
make install
N.B. If you wish to update gtest with respect to Google, use a modern git to do the following:
git submodule update --remote --merge gtest
This will cause the index representing the gtest "head" to be
updated in your local repository. This can be committed if you wish.
You will need to instruct your build system to use the headers as installed and find the libraries. You should also configure your build system to find the HDF5 and/or MPI libraries, if necessary.
The main user-facing classes are Ntuple and Column -- see the
documentation in each header for details, and test/hdf5/Ntuple_t.cpp
for an example of use (installed as example/Ntuple_t.cpp). After
running the tests, you may run h5dump on test/hdf5/test-ntuple.h5 to
examine the structure of the data saved.
The column specification system will be extended to allow user specification of dataset properties such as chunking and compression.
Please fork the repository and send pull requests.
If you believe you have found a bug, or wish to ask a question, please use the issue tracker for this repository.