Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions packs/proc/processing_utils_lecroy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import numpy as np
import pandas as pd
import os
from tqdm import tqdm
import csv
import re

"""
Processing utilities for the Lecroy oscilloscope

This file holds all the relevant functions for the processing of data from csv files to h5.
"""

def parse_lecroy_segmented(lines):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best practices (that I want to implement somewhat retroactively) is to include type-checking for all functions.

In your case that would look like:

def parse_lecroy_segmented(lines  :  str) --> Tuple(pd.DataFrame, pd.DataFrame):

If lines is a string, otherwise use the correct type

You'll have to import Typing for tuples:
from typing import Tuple

# Line 1 has to have: Segments,1000,SegmentSize,5002
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation explaining what the function does, the input parameters and the expected output.

An example can be seen here

segments = int(lines[1][1])
seg_size = int(lines[1][3])
Comment on lines +16 to +17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps:

Suggested change
segments = int(lines[1][1])
seg_size = int(lines[1][3])
segments, seg_size = int(lines[1][1]), int(lines[1][3])

but this is picky, and perhaps a bit more unreadable. The choice is yours 🐱


# Line 2, header is: Segment,TrigTime,TimeSinceSegment1
# Lines 3 to 3 + segments - 1 are header lines
header_start = 3
header_end = header_start + segments
header_lines = lines[header_start:header_end]

header_df = pd.DataFrame(header_lines, columns=["Segment", "TrigTime", "TimeSinceSegment1"])

# Find the "Time,Ampl" line
for i, line in enumerate(lines):
if line[0].strip() == "Time":
data_start = i + 1
break
Comment on lines +27 to +31
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "Time,Ampl" line inconsistent within the data you use? Like, does it sometimes occur 5 lines in, and other times 10 lines?

else:
raise ValueError("Time,Ampl line not found")

# Read the data block (segments × segment size)
raw_data = lines[data_start:]
if len(raw_data) < segments * seg_size:
print(f"Warning: expected {segments * seg_size} rows, got {len(raw_data)}")

value_list = []
for j in range(segments):
segment_data = []
for k in range(seg_size):
x = j * seg_size + k
if x >= len(raw_data): # x = line in the file
segment_data.append(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that a segment will be lost if it is a bit shorter than expected right? Can you quantify how many events you lose this way? Either through a test or otherwise.

else:
try:
value = float(raw_data[x][1]) # column 1 = Amplitude of signal
segment_data.append(value)
except (ValueError, IndexError):
segment_data.append(None)
value_list.append(segment_data)

value_df = pd.DataFrame(value_list)
return value_df, header_df
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sensible test for this function would be to take a small input file (you can save it within the repository) and ensure the output of said file is as expected. On the second revision I'll think of a nicer way to format the main 'work loop' here.