In data_generation.py, book_transformations is defined with duplicate keys like:
book_transformations = {
'book2': {
'ensure_300ppi': {'target_dpi': 150},
'remove_bleed_dual_layer': {},
'denoise_image': {'method': 'bilateral'},
'denoise_image': {'method': 'nlm'},
},
...
}
In Python, later dict entries overwrite earlier ones, so only the last denoise_image config is kept. This means you cannot actually apply denoise_image multiple times with different methods, even though the config suggests that.
This dict is passed into process_multiple_books / process_book_with_transformations in data_utils.py, where transform_order = list(transform_config.keys()) and transform_config[transform_name] are used. Because keys are unique, each transform runs at most once per page.
Suggested fix: change book_transformations to an ordered list of operations, for example:
book_transformations = {
'book2': [
('ensure_300ppi', {'target_dpi': 150}),
('remove_bleed_dual_layer', {}),
('denoise_image', {'method': 'bilateral'}),
('denoise_image', {'method': 'nlm'}),
],
}
and update process_book_with_transformations to iterate over that list instead of transform_config.keys().
In
data_generation.py,book_transformationsis defined with duplicate keys like:In Python, later dict entries overwrite earlier ones, so only the last denoise_image config is kept. This means you cannot actually apply denoise_image multiple times with different methods, even though the config suggests that.
This dict is passed into process_multiple_books / process_book_with_transformations in data_utils.py, where transform_order = list(transform_config.keys()) and transform_config[transform_name] are used. Because keys are unique, each transform runs at most once per page.
Suggested fix: change book_transformations to an ordered list of operations, for example:
and update process_book_with_transformations to iterate over that list instead of transform_config.keys().