In the current state, dxtbx.image_average supports (1) multiple single-frame images (e.g. many CBF files) and (2) single multi-frame container (e.g. one HDF5 file) but not (3) multiple multi-frame containers (e.g. many HDF5 files). This is inconvenient because most modern detectors generate multi-frame containers. We often want to average frames across multiple runs/batches. This affects not only SFX and SMX applications but also MicroED where multi-frame MRC and TIFF images are often used.
@phyy-nx I guess the reason you implemented multi_image_worker and single_image_worker separately was to avoid scanning all input files in the master process (MPI rank 0) when tens of thousands of single-frame images are given. Is it acceptable to assume that if the first input file is multi-frame, all other files are also multi-frame? It is reasonable to assume the number of multi-frame containers are much smaller than the total number of frames to average, so it should be fine to scan all multi-frame containers in the master process to find the frame numbers and calculate work distribution.
In the current state,
dxtbx.image_averagesupports (1) multiple single-frame images (e.g. many CBF files) and (2) single multi-frame container (e.g. one HDF5 file) but not (3) multiple multi-frame containers (e.g. many HDF5 files). This is inconvenient because most modern detectors generate multi-frame containers. We often want to average frames across multiple runs/batches. This affects not only SFX and SMX applications but also MicroED where multi-frame MRC and TIFF images are often used.@phyy-nx I guess the reason you implemented
multi_image_workerandsingle_image_workerseparately was to avoid scanning all input files in the master process (MPI rank 0) when tens of thousands of single-frame images are given. Is it acceptable to assume that if the first input file is multi-frame, all other files are also multi-frame? It is reasonable to assume the number of multi-frame containers are much smaller than the total number of frames to average, so it should be fine to scan all multi-frame containers in the master process to find the frame numbers and calculate work distribution.