Skip to content

Cutflow plot#369

Open
mmarchegiani wants to merge 6 commits intoPocketCoffea:mainfrom
mmarchegiani:cutflow
Open

Cutflow plot#369
mmarchegiani wants to merge 6 commits intoPocketCoffea:mainfrom
mmarchegiani:cutflow

Conversation

@mmarchegiani
Copy link
Contributor

This pull request adds a new command-line script for plotting cutflow histograms from PocketCoffea output files. The script provides flexible options for customizing the plots and printing cutflow summaries, making it easier to visualize and analyze event selection steps.

New plotting and summary script:

  • Added the plot_cutflow.py script in pocket_coffea/scripts/plot/, which uses click for command-line options and supports generating cutflow and sum-of-weights plots from .coffea files.
  • The script includes options to filter samples and categories, customize figure size and output format, and choose between summary-only or full plotting modes.
  • For each sample and year, two plots are created: the cutflow considering the absolute number of events (for data and MC), and the sum of weights (for MC only). In addition, for each sample, a plot considering all data taking years collectively is produced.
  • Integrates with plot_cutflow_from_output and print_cutflow_summary utility functions for plotting and summary printing, improving usability and modularity.
  • In the future, this functionality could be integrated in the make_plots.py script to include by default the cutflow plots when plotting histograms.

Example usage:
python pocket_coffea/scripts/plot/plot_cutflow.py -i output_all.coffea -o cutflow_plots --log-y

Example result:
image

@mmarchegiani mmarchegiani self-assigned this Aug 15, 2025
@mmarchegiani mmarchegiani added the enhancement New feature or request label Aug 15, 2025
@mmarchegiani mmarchegiani moved this to In progress in 1.0 release Sep 7, 2025
@valsdav valsdav force-pushed the main branch 2 times, most recently from 769ae36 to 8fb0e36 Compare November 4, 2025 09:39
@valsdav
Copy link
Contributor

valsdav commented Dec 4, 2025

should we merge this?

@mmarchegiani
Copy link
Contributor Author

I was planning on implementing parallelization of the script + potential integration in make_plots.py script, but I think that can be done in a future PR.
Let's keep plot_cutflow.py as a standalone script for the moment.

We can merge this PR as it is 👍

@mmarchegiani mmarchegiani marked this pull request as ready for review December 5, 2025 18:10
Copilot AI review requested due to automatic review settings December 5, 2025 18:10
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a comprehensive cutflow plotting capability to PocketCoffea, enabling users to visualize event selection steps from processor output files. The implementation provides a CLI tool, utility functions, and documentation for creating cutflow and sum-of-weights plots with CMS styling.

Key Changes:

  • Adds plot-cutflow CLI command integrated via pyproject.toml
  • Implements plotting utilities in cutflow_utils.py with functions for aggregating data by sample, creating plots with ratio panels, and printing summaries
  • Provides detailed documentation with usage examples and technical details

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
pyproject.toml Adds CLI entry point for plot-cutflow command (contains a typo in package name)
pocket_coffea/utils/cutflow_utils.py Core plotting utilities including data aggregation, plot generation with CMS styling, and summary printing functionality
pocket_coffea/scripts/plot/plot_cutflow.py Command-line interface script using Click for option parsing and calling utility functions
docs/cutflow_plots.md Comprehensive documentation covering usage, API, examples, and technical details
docs/index.md Updates table of contents to include cutflow plotting documentation
Comments suppressed due to low confidence (1)

pocket_coffea/scripts/plot/plot_cutflow.py:32

  • Except block directly handles BaseException.
    except:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pocket-coffea="pocket_coffea.__main__:cli"
runner="pocket_coffea.scripts.runner:run"
make-plots="pocket_coffea.scripts.plot.make_plots:make_plots"
plot-cutflow="pocket_coffea.scripts.plot.plot_cutflow:plot_cutflow"
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in package name: 'pockect_coffea' should be 'pocket_coffea'

Copilot uses AI. Check for mistakes.
if year in by_year:
# The format is like "${pico_to_femto:${lumi.picobarns.2017.tot}}"
# We need to extract the actual luminosity value
return f"$\mathcal{{L}}$ = {by_year[year]:.2f}/fb"
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The luminosity formatting is incorrect. The return format f"$\mathcal{{L}}$ = {by_year[year]:.2f}/fb" will produce something like "$\mathcal{L}$ = 41.5/fb", but the correct LaTeX formatting should be f"$\mathcal{{L}}$ = {by_year[year]:.2f} fb$^{{-1}}$" to properly render the inverse femtobarn unit.

Suggested change
return f"$\mathcal{{L}}$ = {by_year[year]:.2f}/fb"
return f"$\mathcal{{L}}$ = {by_year[year]:.2f} fb$^{{-1}}$"

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +39
Similar to what's done in plot_utils.py for the PlotManager class.

Parameters:
-----------
year : str
The year string (e.g., '2016_PreVFP', '2017', '2018', etc.)

Returns:
--------
str
Formatted luminosity text (e.g., "41.5 fb^{-1}")
"""
# Get plotting style defaults like in plot_utils.py line 28
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The docstring states "Similar to what's done in plot_utils.py for the PlotManager class" and references "plot_utils.py line 28", but these comments are vague and may become outdated. Consider removing the specific line reference or making the comment more generic.

Copilot uses AI. Check for mistakes.
exclude_categories: Optional[List[str]] = None,
only_samples: Optional[List[str]] = None,
figsize: Tuple[float, float] = (10, 6),
log_y: bool = False, format: str = 'png') -> Dict[str, List[str]]:
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format parameter shadows the built-in Python format() function. Consider renaming this parameter to output_format or file_format to avoid shadowing the built-in.

Copilot uses AI. Check for mistakes.
@click.option('--log-y', is_flag=True, help='Use logarithmic y-axis')
@click.option('--figsize', type=str, default='10,6', help='Figure size as "width,height"')
@click.option('--summary-only', is_flag=True, help='Only print summary, do not create plots')
def plot_cutflow(input_file, output_dir, exclude_categories, only_samples, format, log_y, figsize, summary_only):
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format parameter in the CLI shadows the built-in Python format() function. Consider renaming this parameter to output_format or file_format to avoid shadowing the built-in.

Copilot uses AI. Check for mistakes.
if (metadata.get('sample') == sample or
dataset_name.startswith(sample) or
sample in dataset_name):
is_mc = metadata.get('isMC', 'True') == 'True'
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for determining if a sample is MC has a potential issue. Line 177 compares metadata.get('isMC', 'True') == 'True' which assumes the value is a string. However, if 'isMC' is stored as a boolean True or False, this comparison will fail. Consider using str(metadata.get('isMC', True)) == 'True' or directly checking metadata.get('isMC', True) in [True, 'True'] to handle both boolean and string representations.

Suggested change
is_mc = metadata.get('isMC', 'True') == 'True'
is_mc = metadata.get('isMC', True) in [True, 'True']

Copilot uses AI. Check for mistakes.

except Exception as e:
print(f"ERROR: {e}")
import traceback
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import traceback statement is inside the except block. For better code organization, move this import to the top of the file with other imports.

Copilot uses AI. Check for mistakes.
gridspec_kw={'height_ratios': [3, 1], 'hspace': 0.2})
else:
fig, ax_main = plt.subplots(figsize=figsize)
ax_ratio = None
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable ax_ratio is not used.

Suggested change
ax_ratio = None

Copilot uses AI. Check for mistakes.

# Create plot with ratio (only for cutflow plots)
if with_ratio and plot_type.lower().startswith('cutflow'):
fig = create_plot(include_ratio=True)
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable fig is not used.

Copilot uses AI. Check for mistakes.
return fig

# Create plot without ratio
fig = create_plot(include_ratio=False)
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'fig' is unnecessary as it is redefined before this value is used.

Suggested change
fig = create_plot(include_ratio=False)
create_plot(include_ratio=False)

Copilot uses AI. Check for mistakes.
@mmarchegiani
Copy link
Contributor Author

Update: this PR is probably broken by this commit: 3d9038d

Further testing is needed before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants