Skip to content

Conversation

@lucabaldini
Copy link
Collaborator

This pull request adds the machinery for writing hit data into binary format for AstroPix4. It includes:

  • core/fmt.py, a python module with all the relevant classes (this is a replacement for decode.py);
  • tests/test_fmt.py with all the unit tests for the new code;
  • apx4_read.py, a stripped-down version of beam_test.py using the new facilities.

The new data structures include a class for an event readout and one for the hits within it. Within the data-taking loop, the two are intended to interoperate like

readout_data = astro.get_readout()
if readout_data:
    trigger_id += 1
    # This unpacks the binary data from the NEXYS board into a proper data structure and decode the hits...
    readout = AstroPixReadout(readout_data, trigger_id, time.time_ns())
    # ...after which you can write the hit into a binary file
    for hit in readout.hits:
        hit.write(output_file)

Note that, compared to the old code, we are now assigning a timestamp to the readout based on the time on the host machine.

The basic format of the output file looks like:

  • a magic word (%APXDF);
  • a singe integer encoding the number of bytes in the header;
  • a actual header, coming from the serialization of an arbitrary python data structure supporting json output;
  • a list of event in binary format.

Facilities for reading and converting binary files are provided, e.g.,

# Open a binary file containing ``AstroPix4Hit`` objects
with AstroPixBinaryFile(AstroPix4Hit).open(file_path) as input_file:
    # Note the header is not a string, but a properly de-serialized Python object.
    print(f'File header: {input_file.header}')
    # The file object implements the iterator protocol, so we can just loop over the hits. 
    for hit in input_file:
        print(hit)

The easiest way to look at the new stuff is probably through the unit tests. A sample data file, along with the correponding .csv version is included into tests/data.

I'd like to start the bikeshedding on this :-)

@lucabaldini
Copy link
Collaborator Author

I forgot to mention that this change should not affect anyone who is not deliberately opting in, as the only change in existing code is a new function in astropix.py that we added to intercept the header data before they were turned into a string (so that we could properly serialize them in order to be able to de-serialize them at read time.)

@lucabaldini
Copy link
Collaborator Author

Hi Dan, Grant,
thanks for the detailed reply, I'll be happy to continue the discussion here on github. (I think there is no rush whatsoever to have any of this merged in, and I am willing to rewrite the thing from scratch as many time as necessary, provided that we converge on something that makes sense and is maintainable in the long run.)

Saving to file the full readout as it comes out of the NEXYS board is surely a sensible thing to do, and I pushed a small modification to the development branch
660c383
that implements a write() method for the AstroPixReadout class. (I assume we want to strip off the final 0xff padding, and so we're talking about a variable-sized structure here.) I'll move on and modify all the necessary data structures, and write the docs and the unit tests.

The rest of your comment points to a more fundamental discussion that we should have had before we started working on this, namely what we expect the DAQ to do. I guess it is never too late, so I'll try and get the ball rolling. In my experience there are at least three things that DAQs typically do:

  1. read data from the HW and write it to disk---this is the bare minimum and I gather you would like to do just do that. Fine. Carmelo is typically leaning toward this, too;

  2. send the data to a network UDP socket (at around the same time they are written to disk) so that a monitoring application can intercept them and make something useful. I don't think we have have ever discussed this, but Melissa, Carmelo and I like this a lot, mostly because it is the way we have always done business;

  3. unpack the data (at around the same time they are written to disk) and make something with it.

I think we all agree about 1).

  1. is something that we have found very handy in both the Fermi LAT and IXPE development. It is something that comes with a minimal overhead and, to me, it is a no-brainer that we should have it at some point. (It doesn't need to be right now.)

  2. is also something that we found extremely useful during the integration of the Femi LAT. We used to have a long series of electrical tests, and in most cases the analysis was happening online (as opposed to the orthogonal model: acquire data and write them to disk; read data from disk, decode them and convert them to something useful; analyze the data and produce a test report.) I distinctly remember this one test measuring the noise and gain for each single channel in a silicon layer that was taking O(1 hour) to complete, and seeing the plots showing up in near real time was so useful, as the operator would spot any possible issue immediately, stop the test and diagnose the problem. Imagine having to wait a hour (plus processing time) to realize you had a connector not properly plugged...

An additional benefit of being able to do some decoding on the DAQ side is that you have a way to verify that the data are not corrupted, which, if I understand, we have no other way at the moment. Finally: if we go for 2) at some point in the future, it is in our interest to have the packets that we multicast over the network to be as small as possible, so again, this would push in the direction of decoding and serving hits instead of full frames.

Personal summary: I would be in favor of keeping the decoding code in the astropix-python repository so that we have the maximum flexibility in terms of writing HW tests and we have a way to make some sanity check on a readout by readout basis as we acquire data. Even in this scheme of things we might have situations in which we might want to acquire a readout and write it to file without even looking at it. But if we move the stuff to a different repo, then that is all we can do.

It goes without saying, I can be convinced otherwise.

There is another thing that I'd like to throw in, namely how we keep track about what files contain and what exactly we put in the header. At this point I just grabbed whatever you guys were writing in the log files, but I think we should make sure we have all we need in it. Do we have a way of making a shapshot of the HW just before the test begins? How do we keep track of the versions of the data formats that, I am sure, will evolve with time?

Finally (switching gears): comment 3. points to a topic that we were very naive about, namely the subtleties of the data decoding 😄 We will look in details at the file that Grant provided, but in the meantime could you point us to any specific resource clarifying what we should expect, and (if at all possible) separating the features of the protocol you guys have put in place from the idiosyncrasies of the hardware? (i.e., for the edge cases that Grant pointed out are we really talking about complexities of the data flow that are there by design, or glitches of the HW and/or communication problems?)

@DanViolette
Copy link

Hi Luca!

Thanks for the thoughtful response. I think maybe the aspect I am missing that is the crux of any disagreement that may arise, is why does having the decoding code in a separate repository preclude us from being flexible with decoding data as we readout?

I was thinking of treating your decoding as a git submodule of astropix-python, which (should?) still give us the ability to write HW tests etc, but allows other projects to also check-out the decoding tool, such as ComPair and A-STEP. From my limited experience with submodules on BurstCube, it was incredibly valuable for developing the instrument FSW in a compartmentalized way from Goddard's larger Central Flight Software (cFS). Maybe you have some good arguments against this that I am missing though!

Returning back to your earlier points:

AstroPixReadout write() function: I think the stripping of the trailing 0xff padding makes sense. Keeping them would blow up the files too much I believe.

Point 1: From a debugging perspective always being able to write raw out I think is useful, but like you point out for 2) and 3), being able to write somewhat decoded data helps with live testing and evaluation. I think your new write functionality gives us the raw write option, while your current system gives us a path forward towards 2 and 3 (and is likely what we will use the majority of time). I like the raw data because sometimes processing data too early can mask issues such as the ones Grant pointed out, but I believe your method already catches decoding errors from what I recall reading through, and shouldn't be an issue.

Point 2: We absolutely want this at some point, especially for A-STEP. The way we will be getting data is through Ethernet Via Telemetry (EVTM) link with the sounding rocket's transceiver system, and a UDP connection is exactly what we want (and hence why having as much of this being set-up agnostic as possible is very attractive to me, as the astep-fw branch has diverged greatly from the astropix-python branch and its limited single-chip running capabilities.

Point 3: Completely agree with this as well!

Header Info: This is a good thing to start thinking about. Right now I believe the file attempts to take the settings that are supposedly included in the configuration write to the chip and includes them in the header, and I think all of the settings are likely worth keeping. We may also want to include a list of activated pixels, which will make it very easier to determine if the data we see makes sense at all. Other things that can be potentially included are things such as the loaded firmware version on the FPGA running the test (this is a supported FPGA register command in astep-fw which I think is also in astropix-python?). Maybe overkill, but maybe a git branch/commit # would be nice to include. In terms of what is ACTUALLY running on the AstroPix chip, the only way to determine this is to use the shift register interface to read out the loaded configuration. What this means, is that we are only ever looking at what was LAST loaded onto the chip, and not what currently is configured. Loading an identical configuration twice would give some peace of mind that what is sent out of the shift register is what currently runs the chip, but in a noisy system it may not be certain. This is a capability I also haven't spent much time using, but it is likely a very powerful tool for debugging both for astropix v4 as well as ASTEP. We may need to write some code to get the SROUT to functionally work, as it is not something that is currently done in the beam_test.py

I'll let @grant-sommer tackle the final question about data decoding subtleties, as he knows more than I do! I think one issue we all share is that this has been a rapid development project and there is little documentation that fully explains what is supposed to be happening on the chips. I think I personally still don't understand how heartbeat bytes can be skipped, but if there is anything we don't fully understand Grant can hopefully work with Nicolas to discern more.

@nic-str
Copy link
Member

nic-str commented Feb 10, 2025

Hi everyone,

First of all, @lucabaldini, I think this looks very nice.

Regarding the number of IDLE bytes ("BC"), you typically see two when you start reading from the chip. This happens because the chip's SPI interface is clocked by the SPI master in the DAQ. The data isn't immediately available because, it has to be transferred from an asynchronous FIFO to an FSM and then to the SPI arbiter which needs few clock cycles.

While the chip is preparing the data, it sends out two IDLE bytes. After these 16 spi clock cycles, data is ready and the chip can start transmitting. As the data is being read, the chip uses the clock cycles to fetch new data, allowing multiple data words to be transmitted without IDLE bytes in between or just one.

The "line break" shown by @grant-sommer is due to the fact that the firmware reads a certain number of bytes which is not synchronized to beginning or ending of dataframes coming from the chip, so it might just stop reading in the middle of a frame.

Let me know if anything needs further clarification!

@lucabaldini
Copy link
Collaborator Author

lucabaldini commented May 22, 2025

Update after all the discussions that we have---I now have a better understanding of how the data transfer work, and I propose not to make the decoding the main point of this pull request, not to interfere with the work that Grant is doing on that side of things.

The main thing that this is doing is really to implement something along the lines of

# Create the file header
header_data = {}
header_data['configuration'] = astro.get_header_data() # chip condifuration
header_data['args'] = args.__dict__ # all command-line switches
header = FileHeader(header_data) # Create the actual header object

# Open the output file and write the header.
output_file = open(data_file_path, 'wb')
header.write(output_file)

# Start the event loop.
readout_data = astro.get_readout()
if readout_data:
    readout = AstroPix4Readout(num_readouts, time.time_ns(), readout_data)
    readout.write(output_file)

The basic structure for the file is

  • a magic word (%APXDF);
  • a singe integer encoding the number of bytes in the header;
  • the header, coming from the serialization of an arbitrary python data structure supporting json output;
  • a list of (raw) readout frames.

We provide data structures to write and read back all the relevant objects. Note that there are several things going on when writing buffers to disk, namely

strip all the padding bytes
write a constant header (currently b'fedbca')
write the trigger ID and the timestamp from the host machine
write the length of the (variable-size) frame
write the actual frame, with all the padding bytes stripped off.

Additionally, we provide facilities to convert a .apx file to .cvs (this involves some decoding, but the thing can be improved upon and integrated with Grant's code later on.)

file_path = apxdf_to_csv(data_file_path, AstroPix4Hit)

This is all opt-in, i.e., none of the existing code should be affected (cross your fingers).

A few top-level questions:

  1. does this make sense?
  2. is the overall data structure sensible?
  3. where does this belong to (this repo, or another)?
  4. is there anything fundamental we should do before considering the PR for merge?

More detailed questions

a. do all the magic things (magic word, frame header, etc) make sense?
b. is there any clever choice for the frame header (e.g., one byte combination we know is never coming out the NEXYS board)?
c. is there anything we can improve with the naming of the data strcuture (e.g., shouid AstropixReadout be called a different name)?
d. what do we want to put in the header and how do we get that in a sensible way?
e. does it make sense to implement a conversion to HDF5 (so that we can incorporate the header)?
f. how do we go about integrating the docs and the unit tests in the repo?

@lucabaldini
Copy link
Collaborator Author

Short summary of the discussion on May 22, 2025:

  • we're moving all the code to the astropix-analysis repo
  • we change trigger_id -> readout_id
  • we close this pull request
  • we open a new pull request on the other repo.

@lucabaldini
Copy link
Collaborator Author

Ok, I think this is now ready to be merged.

I moved pretty much everything on the astropix-analysis repo, and all is left is:

  • a new apx4_read.py script to run some data acquisition with a Astropix4 setup (and using all the new stuff in astropix-analysis)
  • a minor addition to astropix.py in order to be able to have something sensible to put in the header of the output file.

The changes should not affect anybody who is not deliberately trying to use the new stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants