-
Notifications
You must be signed in to change notification settings - Fork 9
Proposal for higher dimensional data and user defined spacers #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
For the second part, I was under the impression we already allowed the user to specify a separator between datasets, which could be multiple newlines if you want. Why do we need anything additional? Line 218 in 642acc7
|
|
For the first part, I am not quite sure what the motivation is. I thought the main benefit of the text format was a human-readable format that could also easily be ingested by common programs like Excel or Origin or numpy, without modification? I can see making a format that gnuplot can natively read, but that already seems like a niche use case and if a bespoke reader library is required (in order to read the custom |
|
There are several things to unpick here. time based datasetsTime sliced datasets should ideally be put into separate datasets. There's no guarantee that each time slice would have the same number of data points or identical Q-values. multidimensional datasets<TLDR: can additional columns after Q/R/dR/dQ be multidimensional, and be stored in different HDF datasets?> There's a high chance multidimensional datasets such as GISANS are not rectangular. e.g. whilst a detector image has a pixel grid, the Qz/Qy values probably aren't linearly spaced. For datasets like that, if you're saving in an ORT file it might be better to add columns, i.e. the regular 4, and then the extra axes. Qz, R, dR, dQ [, Qy]. You just have to have as many rows as you do pixels. It might be a better path to transition to ORB, rather than ORT, at this point. I believe, @bmaranville correct me if I'm wrong, that each columns doesn't have to have 1 dimension, they can be multidimensional? If not, then I'd like for that to be worked on. For higher dimensions binary files are probably better than text based format. It's much easier to load/save multiple dimension arrays to HDF (npy) than it is to come up with a list of rules to how they should be stored in text. Multidimensional arrays are what is needed for sophisticated resolution smearing kernels. Here each Q point has an associated probability distribution, i.e. This means that for a dataset with N points each of the columns has shape:
I reckon it's easier to do this in ORB than ORT. |
|
It might also be nice if the ORB file could have every column being multidimensional. |
|
<TLDR: Why not add a convenience capability that does not reduce generatlity/compatibility?> Just a bit of my background thoughts on the two point from.
As said before, for me this is an improvement without negative side effects. What we could discuss is, if we want to flatten the columns only on export and keep them in their original shape in the Orso object. This has the advantage that the shape is always clear, but would break existing program integration that expects a column to be 1d. |
@jochenstahn asked me about an option to add extra newlines after certain datasets to facility 2D plotting with Gnuplot of blocks of datasets (e.g. a time series as separate files that contain spin-up and spin-down data in the same time), this made me think about a possibile generic solution to higher dimensional data (off-specular / GISANS).
I've come up with a proposal of how we could handle such a situation:
In addition to this treatment, I've added some convenience functions to the OrsoDataset class that allow to iterate over columns and index it for data, in which cases the data is automatically reconstructed to the original shape. I haven't tested it yet, but I think the NeXus version should work automatically.
A second part is an optional spacer that can be added when writing text files. The class just inserts a string between datasets, but could be sub-classed for more custom behavior. Typical applications would be additional new-lines or a nice separator comment making the file more readible. It is only relevant for the text file export and lost on read.
I'll add some examples and extra tests, but wanted to get some opinions first.
Example File: test.ort