Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tailcut finder function and script #1325

Merged
merged 45 commits into from
Jan 27, 2025
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
9f698dd
Script to "compute" tailcuts and NSB level to be added to the MC
moralejo Dec 13, 2024
7a707e3
Minor fix
moralejo Dec 13, 2024
afd9ba6
mean => median
moralejo Dec 13, 2024
47115f0
cast to int
moralejo Dec 13, 2024
2336cdb
Changed arguments
moralejo Dec 16, 2024
be28773
Minor fix
moralejo Dec 16, 2024
1c8aa49
Path to str in glob argument
moralejo Dec 16, 2024
fcaad2f
Simplification on the selection of the subsample of subruns
moralejo Dec 16, 2024
c359bf6
Fix in subrun selection
moralejo Dec 16, 2024
0733892
Refactoring
moralejo Dec 16, 2024
ee6f29f
set logging.INFO
moralejo Dec 16, 2024
fb1ca4d
Additional NSB computation
moralejo Dec 16, 2024
ef812ce
Better log message
moralejo Dec 16, 2024
17a15c5
Outlier detection
moralejo Dec 16, 2024
2a677b4
Changed outlier detection
moralejo Dec 16, 2024
16bffee
Improved log message
moralejo Dec 16, 2024
c115c11
Better log message
moralejo Dec 16, 2024
9269f2c
Added the median pedestal charge among the returns of find_tailcuts
moralejo Dec 17, 2024
3fd7b3a
string formatting
moralejo Dec 17, 2024
cf54e81
New parameters for function to obtain (from pedestal charges) the add…
moralejo Dec 17, 2024
794f992
Removed from lstchain_dvr_pixselector.py the creation of a json file …
moralejo Dec 18, 2024
353fb18
Remove unused imports
moralejo Dec 18, 2024
b78758c
Check possible wrong input
moralejo Dec 18, 2024
9d88a86
Check for number of valid pixels
moralejo Dec 18, 2024
ac61081
Check for number of pedestals in subrun
moralejo Dec 18, 2024
c8c47de
missing continue
moralejo Dec 18, 2024
2913357
Avoid useless warning from rare empty pixels
moralejo Dec 18, 2024
7486271
Modified limits for cleaning level changes. The former ones referred …
moralejo Dec 19, 2024
c02bf8e
Improved comment
moralejo Dec 19, 2024
45a0cfa
Fixed buggy computation of outlier pixels. Previous one was not removing
moralejo Dec 20, 2024
8f80135
Updated parameters of additional_nsb vs. median_qped parametrization
moralejo Dec 20, 2024
3f0749e
Replaced median of pixel charge in pedestal events by the 95% quantil…
moralejo Dec 20, 2024
9f344d4
Automatic log file name if not provided
moralejo Dec 22, 2024
66add8b
Better treatment of pathological runs
moralejo Jan 15, 2025
57725a7
Update lstchain_find_tailcuts.py
moralejo Jan 15, 2025
684d2ea
Report fraction of unreliable pixels
moralejo Jan 15, 2025
9425c83
Remove spaces from logging message
moralejo Jan 15, 2025
9259b25
Include run number in message
moralejo Jan 15, 2025
69cef5c
Fix log message
moralejo Jan 15, 2025
0b0eebb
Remove unused "tailcut" settings
moralejo Jan 23, 2025
ea23283
Require that at least half of subruns are ok
moralejo Jan 23, 2025
fbc5cd1
Update cleaning.py
moralejo Jan 23, 2025
24aa61f
[skip-ci] Suggestions from code review (docstring cleanup, typos)
morcuended Jan 27, 2025
9f5df1d
Apply suggestions from code review (paths cleanup)
morcuended Jan 27, 2025
563bc26
Removed logging.SetLevel
moralejo Jan 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 197 additions & 1 deletion lstchain/image/cleaning.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
import glob
import logging
import numpy as np
from pathlib import Path
from lstchain.paths import run_to_dl1_filename
from lstchain.io.io import dl1_params_lstcam_key, dl1_images_lstcam_key
from lstchain.io.io import dl1_params_tel_mon_cal_key
from lstchain.io.config import get_standard_config
from ctapipe.containers import EventType
from scipy.stats import median_abs_deviation

__all__ = ['apply_dynamic_cleaning']
from ctapipe.io import read_table
__all__ = ['apply_dynamic_cleaning',
'find_tailcuts',
'pic_th']

log = logging.getLogger(__name__)


def apply_dynamic_cleaning(image, signal_pixels, threshold, fraction):
Expand Down Expand Up @@ -36,3 +50,185 @@
mask_dynamic_cleaning = (image >= dynamic_threshold) & signal_pixels

return mask_dynamic_cleaning


def find_tailcuts(input_dir, run_number):
morcuended marked this conversation as resolved.
Show resolved Hide resolved
"""
This functions uses DL1 files to determine tailcuts which are adequate
for the bulk of the pixels in a given run. It does so simply based on the
median (for the whole camera) of the median pixel charge for pedestal
events.
For reasons of stability & simplicity of analysis, we cannot decide the
cleaning levels on a subrun-by-subrun basis. We select values which are
valid for the whole run.
The script also returns the suggested NSB adjustment needed in the
"dark-sky" MC to match the data.
moralejo marked this conversation as resolved.
Show resolved Hide resolved
The script will process a subset of the subruns (~10, hardcoded) of the run,
distributed uniformly through it.

Parameters
----------
input_dir: `Path`
directory where the DL1 files (subrun-wise, i,e, including
morcuended marked this conversation as resolved.
Show resolved Hide resolved
DL1a) are stored

run_number: ìnt`
morcuended marked this conversation as resolved.
Show resolved Hide resolved
run number to be processed

Returns
-------
additional_nsb_rate: p.e./ns rate of NSB to be added to "dark MC" to
match the noise in the data
newconfig (dict): cleaning configuration for running the DL1ab stage
morcuended marked this conversation as resolved.
Show resolved Hide resolved
"""

log.setLevel(logging.INFO)

Check warning on line 85 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L85

Added line #L85 was not covered by tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log level should be defined in the main script not inside this function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it is enough to remove that line, right? - if so, done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Unfortunately it was not... :-(
I just realized now that important info on unreliable pixels & outliers is now missing from the logs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morcuended Are the loggers of the script and of the called functions independent? Should I pass the logging level explicitly as an argument of find_tailcuts? I want it always set to "info", so the original code should do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's continue in #1349


# subrun-wise dl1 file names:
dl1_filenames = Path(input_dir,

Check warning on line 88 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L88

Added line #L88 was not covered by tests
run_to_dl1_filename(1, run_number, 0).replace(
'.0000.h5', '.????.h5'))
all_dl1_files = glob.glob(str(dl1_filenames))
all_dl1_files.sort()

Check warning on line 92 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L91-L92

Added lines #L91 - L92 were not covered by tests

# Number of median absolute deviations (mad) away from the median that a
# value has to be to be considered an outlier:
mad_max = 5 # would exclude 8e-4 of the pdf for a gaussian

Check warning on line 96 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L96

Added line #L96 was not covered by tests

# Aprox. maximum number of subruns (uniformly distributed through the
morcuended marked this conversation as resolved.
Show resolved Hide resolved
# run) to be processed:
max_number_of_processed_subruns = 10

Check warning on line 100 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L100

Added line #L100 was not covered by tests
# Keep only ~max_number_of_processed_subruns subruns, distributed
# along the run:
dl1_files = all_dl1_files[::int(1+len(all_dl1_files) /

Check warning on line 103 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L103

Added line #L103 was not covered by tests
max_number_of_processed_subruns)]

number_of_pedestals = []
usable_pixels = []
median_ped_median_pix_charge = []

Check warning on line 108 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L106-L108

Added lines #L106 - L108 were not covered by tests

for dl1_file in dl1_files:
log.info('\nInput file: %s', dl1_file)

Check warning on line 111 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L110-L111

Added lines #L110 - L111 were not covered by tests

data_parameters = read_table(dl1_file, dl1_params_lstcam_key)
event_type_data = data_parameters['event_type'].data
pedestal_mask = event_type_data == EventType.SKY_PEDESTAL.value

Check warning on line 115 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L113-L115

Added lines #L113 - L115 were not covered by tests

number_of_pedestals.append(pedestal_mask.sum())
data_images = read_table(dl1_file, dl1_images_lstcam_key)
data_calib = read_table(dl1_file, dl1_params_tel_mon_cal_key)

Check warning on line 119 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L117-L119

Added lines #L117 - L119 were not covered by tests
# data_calib['unusable_pixels'] , indices: (Gain Calib_id Pixel)

# Get the "unusable" flags from the pedcal file:
unusable_hg = data_calib['unusable_pixels'][0][0]
unusable_lg = data_calib['unusable_pixels'][0][1]

Check warning on line 124 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L123-L124

Added lines #L123 - L124 were not covered by tests

reliable_pixels = ~(unusable_hg | unusable_lg)
usable_pixels.append(reliable_pixels)

Check warning on line 127 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L126-L127

Added lines #L126 - L127 were not covered by tests

charges_data = data_images['image']
charges_pedestals = charges_data[pedestal_mask]

Check warning on line 130 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L129-L130

Added lines #L129 - L130 were not covered by tests
# pixel-wise median charge through the subrun:
median_pix_charge = np.nanmedian(charges_pedestals, axis=0)
median_pix_charge_dev = median_abs_deviation(charges_pedestals,

Check warning on line 133 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L132-L133

Added lines #L132 - L133 were not covered by tests
axis=0,
nan_policy='omit')
# Just a cut to remove outliers (pixels):
outliers = (np.abs(charges_pedestals - median_pix_charge) /

Check warning on line 137 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L137

Added line #L137 was not covered by tests
median_pix_charge_dev) > mad_max
if outliers.sum() > 0:
removed_fraction = outliers.sum() / outliers.size
log.info(f' Removed {removed_fraction:.2%} of pixels (outliers) '

Check warning on line 141 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L139-L141

Added lines #L139 - L141 were not covered by tests
f'from pedestal median calculation')
# Replace outliers by nans:
charges_pedestals = np.where(outliers, np.nan, charges_pedestals)

Check warning on line 144 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L144

Added line #L144 was not covered by tests
# Recompute the pixel-wise medians ignoring the outliers:
median_pix_charge = np.nanmedian(charges_pedestals, axis=0)

Check warning on line 146 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L146

Added line #L146 was not covered by tests

# Now compute the median (for the whole camera) of the medians (for
# each pixel) of the charges in pedestal events. Use only reliable
# pixels for this:
median_ped_median_pix_charge.append(np.nanmedian(median_pix_charge[

Check warning on line 151 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L151

Added line #L151 was not covered by tests
reliable_pixels]))

# convert to ndarray:
median_ped_median_pix_charge = np.array(median_ped_median_pix_charge)
number_of_pedestals = np.array(number_of_pedestals)

Check warning on line 156 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L155-L156

Added lines #L155 - L156 were not covered by tests

# Now compute the median for all processed subruns, which is more robust
# against e.g. subruns affected by car flashes. We exclude subruns
# which have less than half of the median statistics per subrun.
good_stats = number_of_pedestals > 0.5 * np.median(number_of_pedestals)
qped = np.nanmedian(median_ped_median_pix_charge[good_stats])

Check warning on line 162 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L161-L162

Added lines #L161 - L162 were not covered by tests
# Now we also remove outliers (subruns) if any:
qped_dev = median_abs_deviation(median_ped_median_pix_charge[good_stats])
not_outlier = (np.abs(median_ped_median_pix_charge - qped) /

Check warning on line 165 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L164-L165

Added lines #L164 - L165 were not covered by tests
qped_dev) < mad_max

if (~good_stats).sum() > 0:
log.info(f'\nNumber of subruns with low statistics: '

Check warning on line 169 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L168-L169

Added lines #L168 - L169 were not covered by tests
f'{(~good_stats).sum()} - removed from pedestal median '
f'calculation')
if (~not_outlier).sum() > 0:
log.info(f'\nRemoved {(~not_outlier).sum()} outlier subruns '

Check warning on line 173 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L172-L173

Added lines #L172 - L173 were not covered by tests
f'(out of {not_outlier.size}) from pedestal median '
f'calculation')
# recompute without outliers:
qped = np.nanmedian(median_ped_median_pix_charge[good_stats & not_outlier])

Check warning on line 177 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L177

Added line #L177 was not covered by tests

picture_threshold = pic_th(qped)
boundary_threshold = picture_threshold / 2

Check warning on line 180 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L179-L180

Added lines #L179 - L180 were not covered by tests

# We now create a .json files with recommended image cleaning
# settings for lstchain_dl1ab.
newconfig = get_standard_config()['tailcuts_clean_with_pedestal_threshold']

Check warning on line 184 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L184

Added line #L184 was not covered by tests
# casts below are needed, json does not like numpy's int64:
newconfig['picture_thresh'] = int(picture_threshold)
newconfig['boundary_thresh'] = int(boundary_threshold)

Check warning on line 187 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L186-L187

Added lines #L186 - L187 were not covered by tests

additional_nsb_rate = get_nsb(qped)

Check warning on line 189 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L189

Added line #L189 was not covered by tests

return qped, additional_nsb_rate, newconfig

Check warning on line 191 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L191

Added line #L191 was not covered by tests


def pic_th(mean_ped):
"""
Parameters
----------
mean_ped: `float`
mean pixel charge in pedestal events (for the standard
LocalPeakWindowSearch algo & settings in lstchain)

Returns
-------
`int`
morcuended marked this conversation as resolved.
Show resolved Hide resolved
recommended picture threshold for image cleaning (from a table)

"""
mp_edges = np.array([2.4, 3.1, 3.8, 4.5, 5.2])
picture_threshold = np.array([8, 10, 12, 14, 16, 18])

Check warning on line 209 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L208-L209

Added lines #L208 - L209 were not covered by tests

if mean_ped >= mp_edges[-1]:
return picture_threshold[-1]
return picture_threshold[np.where(mp_edges > mean_ped)[0][0]]

Check warning on line 213 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L211-L213

Added lines #L211 - L213 were not covered by tests


def get_nsb(median_ped):
"""
Parameters
----------
median_ped: `float`
median pixel charge in pedestal events

Returns
-------
`float`
morcuended marked this conversation as resolved.
Show resolved Hide resolved
(from a parabolic parametrization) the recommended additional NSB
(in p.e. / ns) that has to be added to the "dark MC" waveforms in
order to match the data for which the median pedestal charge is
median_ped

"""
params = [1.63991237, 0.3130943, 0.07311759]
return (params[1] * (median_ped - params[0]) +

Check warning on line 233 in lstchain/image/cleaning.py

View check run for this annotation

Codecov / codecov/patch

lstchain/image/cleaning.py#L232-L233

Added lines #L232 - L233 were not covered by tests
params[2] * (median_ped - params[0]) ** 2)
76 changes: 76 additions & 0 deletions lstchain/scripts/lstchain_find_tailcuts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/usr/bin/env python

"""
This script uses DL1 files to determine tailcuts which are adequate for the
bulk of the pixels in a given run. It does so simply based on the median (for
the whole camera) of the median pixel charge for pedestal events.

For reasons of stability & simplicity of analysis, we cannot decide the
cleaning levels on a subrun-by-subrun basis. We select values which are ok
for the whole run.

The script writes out the cleaning settings to a json file,
e.g. dl1ab_Run13181.json
It also returns the suggested NSB adjustment needed in the "dark-sky" MC
morcuended marked this conversation as resolved.
Show resolved Hide resolved
to match the data, in units of p.e./ns.

lstchain_find_tailcuts -d "/..../DL1/YYYYMMDD/v0.10/tailcut84 -r 13181 --log
out.log "
morcuended marked this conversation as resolved.
Show resolved Hide resolved

"""

import argparse
import logging
from pathlib import Path
import sys

from lstchain.io.config import get_standard_config, dump_config
from lstchain.image.cleaning import find_tailcuts

parser = argparse.ArgumentParser(description="Tailcut finder")

parser.add_argument('-d', '--input-dir', dest='input_dir',
morcuended marked this conversation as resolved.
Show resolved Hide resolved
type=Path, default='./',
help='Input DL1 directory')
parser.add_argument('-r', '--run', dest='run_number',
type=int, help='Run number')
parser.add_argument('-o', '--output-dir', dest='output_dir',
morcuended marked this conversation as resolved.
Show resolved Hide resolved
type=Path, default='./',
help='Path to the output directory (default: %(default)s)')
parser.add_argument('--log', dest='log_file',
type=str, default=None,
help='Log file name')

log = logging.getLogger(__name__)

def main():
args = parser.parse_args()
log.setLevel(logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(sys.stdout))

Check warning on line 49 in lstchain/scripts/lstchain_find_tailcuts.py

View check run for this annotation

Codecov / codecov/patch

lstchain/scripts/lstchain_find_tailcuts.py#L48-L49

Added lines #L48 - L49 were not covered by tests

log_file = args.log_file
if log_file is not None:
handler = logging.FileHandler(log_file, mode='w')
logging.getLogger().addHandler(handler)

Check warning on line 54 in lstchain/scripts/lstchain_find_tailcuts.py

View check run for this annotation

Codecov / codecov/patch

lstchain/scripts/lstchain_find_tailcuts.py#L51-L54

Added lines #L51 - L54 were not covered by tests

output_dir = args.output_dir.absolute()
output_dir.mkdir(exist_ok=True, parents=True)
input_dir = args.input_dir.absolute()

Check warning on line 58 in lstchain/scripts/lstchain_find_tailcuts.py

View check run for this annotation

Codecov / codecov/patch

lstchain/scripts/lstchain_find_tailcuts.py#L56-L58

Added lines #L56 - L58 were not covered by tests

run_id = args.run_number
median_qped, additional_nsb_rate, newconfig = find_tailcuts(input_dir,

Check warning on line 61 in lstchain/scripts/lstchain_find_tailcuts.py

View check run for this annotation

Codecov / codecov/patch

lstchain/scripts/lstchain_find_tailcuts.py#L60-L61

Added lines #L60 - L61 were not covered by tests
run_id)

json_filename = Path(output_dir, f'dl1ab_Run{run_id:05d}.json')
morcuended marked this conversation as resolved.
Show resolved Hide resolved
dump_config({'tailcuts_clean_with_pedestal_threshold': newconfig,

Check warning on line 65 in lstchain/scripts/lstchain_find_tailcuts.py

View check run for this annotation

Codecov / codecov/patch

lstchain/scripts/lstchain_find_tailcuts.py#L64-L65

Added lines #L64 - L65 were not covered by tests
'dynamic_cleaning': get_standard_config()['dynamic_cleaning']},
json_filename, overwrite=True)
log.info(f'\nMedian pedestal charge: {median_qped:.3f} p.e.')
log.info('\nCleaning settings:')
log.info(newconfig)
log.info('\nWritten to:')
log.info(json_filename)
log.info(f'\nAdditional NSB rate (over dark MC): {additional_nsb_rate:.4f} '

Check warning on line 73 in lstchain/scripts/lstchain_find_tailcuts.py

View check run for this annotation

Codecov / codecov/patch

lstchain/scripts/lstchain_find_tailcuts.py#L68-L73

Added lines #L68 - L73 were not covered by tests
f'p.e./ns')

log.info('lstchain_find_tailcuts finished successfully!')

Check warning on line 76 in lstchain/scripts/lstchain_find_tailcuts.py

View check run for this annotation

Codecov / codecov/patch

lstchain/scripts/lstchain_find_tailcuts.py#L76

Added line #L76 was not covered by tests
Loading