Conversation
769ae36 to
8fb0e36
Compare
|
should we merge this? |
|
I was planning on implementing parallelization of the script + potential integration in We can merge this PR as it is 👍 |
There was a problem hiding this comment.
Pull request overview
This pull request adds a comprehensive cutflow plotting capability to PocketCoffea, enabling users to visualize event selection steps from processor output files. The implementation provides a CLI tool, utility functions, and documentation for creating cutflow and sum-of-weights plots with CMS styling.
Key Changes:
- Adds
plot-cutflowCLI command integrated via pyproject.toml - Implements plotting utilities in
cutflow_utils.pywith functions for aggregating data by sample, creating plots with ratio panels, and printing summaries - Provides detailed documentation with usage examples and technical details
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Adds CLI entry point for plot-cutflow command (contains a typo in package name) |
| pocket_coffea/utils/cutflow_utils.py | Core plotting utilities including data aggregation, plot generation with CMS styling, and summary printing functionality |
| pocket_coffea/scripts/plot/plot_cutflow.py | Command-line interface script using Click for option parsing and calling utility functions |
| docs/cutflow_plots.md | Comprehensive documentation covering usage, API, examples, and technical details |
| docs/index.md | Updates table of contents to include cutflow plotting documentation |
Comments suppressed due to low confidence (1)
pocket_coffea/scripts/plot/plot_cutflow.py:32
- Except block directly handles BaseException.
except:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pocket-coffea="pocket_coffea.__main__:cli" | ||
| runner="pocket_coffea.scripts.runner:run" | ||
| make-plots="pocket_coffea.scripts.plot.make_plots:make_plots" | ||
| plot-cutflow="pocket_coffea.scripts.plot.plot_cutflow:plot_cutflow" |
There was a problem hiding this comment.
Typo in package name: 'pockect_coffea' should be 'pocket_coffea'
| if year in by_year: | ||
| # The format is like "${pico_to_femto:${lumi.picobarns.2017.tot}}" | ||
| # We need to extract the actual luminosity value | ||
| return f"$\mathcal{{L}}$ = {by_year[year]:.2f}/fb" |
There was a problem hiding this comment.
The luminosity formatting is incorrect. The return format f"$\mathcal{{L}}$ = {by_year[year]:.2f}/fb" will produce something like "$\mathcal{L}$ = 41.5/fb", but the correct LaTeX formatting should be f"$\mathcal{{L}}$ = {by_year[year]:.2f} fb$^{{-1}}$" to properly render the inverse femtobarn unit.
| return f"$\mathcal{{L}}$ = {by_year[year]:.2f}/fb" | |
| return f"$\mathcal{{L}}$ = {by_year[year]:.2f} fb$^{{-1}}$" |
| Similar to what's done in plot_utils.py for the PlotManager class. | ||
|
|
||
| Parameters: | ||
| ----------- | ||
| year : str | ||
| The year string (e.g., '2016_PreVFP', '2017', '2018', etc.) | ||
|
|
||
| Returns: | ||
| -------- | ||
| str | ||
| Formatted luminosity text (e.g., "41.5 fb^{-1}") | ||
| """ | ||
| # Get plotting style defaults like in plot_utils.py line 28 |
There was a problem hiding this comment.
[nitpick] The docstring states "Similar to what's done in plot_utils.py for the PlotManager class" and references "plot_utils.py line 28", but these comments are vague and may become outdated. Consider removing the specific line reference or making the comment more generic.
| exclude_categories: Optional[List[str]] = None, | ||
| only_samples: Optional[List[str]] = None, | ||
| figsize: Tuple[float, float] = (10, 6), | ||
| log_y: bool = False, format: str = 'png') -> Dict[str, List[str]]: |
There was a problem hiding this comment.
The format parameter shadows the built-in Python format() function. Consider renaming this parameter to output_format or file_format to avoid shadowing the built-in.
| @click.option('--log-y', is_flag=True, help='Use logarithmic y-axis') | ||
| @click.option('--figsize', type=str, default='10,6', help='Figure size as "width,height"') | ||
| @click.option('--summary-only', is_flag=True, help='Only print summary, do not create plots') | ||
| def plot_cutflow(input_file, output_dir, exclude_categories, only_samples, format, log_y, figsize, summary_only): |
There was a problem hiding this comment.
The format parameter in the CLI shadows the built-in Python format() function. Consider renaming this parameter to output_format or file_format to avoid shadowing the built-in.
| if (metadata.get('sample') == sample or | ||
| dataset_name.startswith(sample) or | ||
| sample in dataset_name): | ||
| is_mc = metadata.get('isMC', 'True') == 'True' |
There was a problem hiding this comment.
The logic for determining if a sample is MC has a potential issue. Line 177 compares metadata.get('isMC', 'True') == 'True' which assumes the value is a string. However, if 'isMC' is stored as a boolean True or False, this comparison will fail. Consider using str(metadata.get('isMC', True)) == 'True' or directly checking metadata.get('isMC', True) in [True, 'True'] to handle both boolean and string representations.
| is_mc = metadata.get('isMC', 'True') == 'True' | |
| is_mc = metadata.get('isMC', True) in [True, 'True'] |
|
|
||
| except Exception as e: | ||
| print(f"ERROR: {e}") | ||
| import traceback |
There was a problem hiding this comment.
The import traceback statement is inside the except block. For better code organization, move this import to the top of the file with other imports.
| gridspec_kw={'height_ratios': [3, 1], 'hspace': 0.2}) | ||
| else: | ||
| fig, ax_main = plt.subplots(figsize=figsize) | ||
| ax_ratio = None |
There was a problem hiding this comment.
Variable ax_ratio is not used.
| ax_ratio = None |
|
|
||
| # Create plot with ratio (only for cutflow plots) | ||
| if with_ratio and plot_type.lower().startswith('cutflow'): | ||
| fig = create_plot(include_ratio=True) |
There was a problem hiding this comment.
Variable fig is not used.
| return fig | ||
|
|
||
| # Create plot without ratio | ||
| fig = create_plot(include_ratio=False) |
|
Update: this PR is probably broken by this commit: 3d9038d Further testing is needed before merging. |
This pull request adds a new command-line script for plotting cutflow histograms from PocketCoffea output files. The script provides flexible options for customizing the plots and printing cutflow summaries, making it easier to visualize and analyze event selection steps.
New plotting and summary script:
plot_cutflow.pyscript inpocket_coffea/scripts/plot/, which usesclickfor command-line options and supports generating cutflow and sum-of-weights plots from.coffeafiles.plot_cutflow_from_outputandprint_cutflow_summaryutility functions for plotting and summary printing, improving usability and modularity.make_plots.pyscript to include by default the cutflow plots when plotting histograms.Example usage:
python pocket_coffea/scripts/plot/plot_cutflow.py -i output_all.coffea -o cutflow_plots --log-yExample result:
