Add compression comparison functionality #386

BradleyBooth · 2025-07-18T15:53:07Z

Added

New autoencoder model in models.py - AE_float32
When Baler compressed float32 data using the default AE model, it resulted in compressed files larger than the original. This was due to the layers all being hardcoded to float64. Using this model with float32 data avoids the issue.
- New model inherits from existing AE model class.
- Linear layers are modified to use dtype=torch.float32
- To utilise the model add c.float_dtype = "float32" to the project config file.
Lossy Compression Comparison functionality (compare.py)
New baler operating mode defined in baler.py to benchmark baler performance on the current project against a selection of lossy compression approaches
- To access, run baler using --mode compare

jlsmith-hep · 2025-07-23T14:34:49Z

.gitignore

+/external/*
+
+# Exclude results tracking files
+green_code_tracking.txt


Suggested change

green_code_tracking.txt

*.txt

*.npz

*.dat

*.png

*.root

*.jpg

*.jpeg

*.log

Are there any .txt files we would want to track? Maybe a catch-all for results/data/log files would be better

jlsmith-hep · 2025-07-23T14:34:55Z

baler/baler.py

        helper.create_new_project(workspace_name, project_name, verbose)
    elif mode == "train":
-        perform_training(output_path=output_path, config=config, verbose=verbose)
+        perform_training(output_path, config, project_name, verbose)


Are we sure project_name is always provided? Do we have a default? Just thinking this would break compatibility with an old script if there isn't a default, and this isn't defined

jlsmith-hep · 2025-07-23T14:35:13Z

baler/modules/compare.py

@@ -0,0 +1,380 @@
+# Copyright 2022 Baler Contributors


Suggested change

# Copyright 2022 Baler Contributors

# Copyright 2022-2025 Baler Contributors

And similar for other files

jlsmith-hep · 2025-07-23T14:39:42Z

Hi @BradleyBooth , nice PR! Left a couple small comments, but I have a bigger one - there's a lot of refactoring into new methods and classes - are we sure this doesn't affect functionality? Is there any validation to look at, e.g. running a bundled example like CMS or CFD and seeing that they give the same output?

BradleyBooth added 10 commits July 16, 2025 15:43

Implement compression comparison

c32811e

Add float32 based AutoEncoder model

f31c512

Add 'baler\external' directory to .gitignore

87c56b8

Add green code and compression results outputs

1ee30b8

Prepare code for pull request

7c5b696

Remove allow_pickle=True from helper.py\process()

f648b85

Add comment to explain AE_float32 class in models.py

a9a9fc3

Update .gitignore

b1ba85d

Prevent syncing of results tracking files

41fb3ea

Remove unused imports and non-functional SZ3 code

9f3d715

jlsmith-hep reviewed Jul 23, 2025

View reviewed changes

Change plotting functions to improve output

322cd23

BradleyBooth marked this pull request as draft July 27, 2025 13:48

Update copyright dates

c1f9f81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add compression comparison functionality #386

Add compression comparison functionality #386

Uh oh!

BradleyBooth commented Jul 18, 2025

Uh oh!

jlsmith-hep Jul 23, 2025

Uh oh!

jlsmith-hep Jul 23, 2025

Uh oh!

jlsmith-hep Jul 23, 2025

Uh oh!

jlsmith-hep commented Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# Copyright 2022 Baler Contributors
	# Copyright 2022-2025 Baler Contributors

Add compression comparison functionality #386

Are you sure you want to change the base?

Add compression comparison functionality #386

Uh oh!

Conversation

BradleyBooth commented Jul 18, 2025

Added

Uh oh!

jlsmith-hep Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

jlsmith-hep Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

jlsmith-hep Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

jlsmith-hep commented Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants