Skip to content

book_transformations silently drops repeated transforms due to dict overwrites #78

@VibezCoder

Description

@VibezCoder

In data_generation.py, book_transformations is defined with duplicate keys like:

book_transformations = {
    'book2': {
        'ensure_300ppi': {'target_dpi': 150},
        'remove_bleed_dual_layer': {},
        'denoise_image': {'method': 'bilateral'},
        'denoise_image': {'method': 'nlm'},
    },
    ...
}

In Python, later dict entries overwrite earlier ones, so only the last denoise_image config is kept. This means you cannot actually apply denoise_image multiple times with different methods, even though the config suggests that.

This dict is passed into process_multiple_books / process_book_with_transformations in data_utils.py, where transform_order = list(transform_config.keys()) and transform_config[transform_name] are used. Because keys are unique, each transform runs at most once per page.

Suggested fix: change book_transformations to an ordered list of operations, for example:

book_transformations = {
    'book2': [
        ('ensure_300ppi', {'target_dpi': 150}),
        ('remove_bleed_dual_layer', {}),
        ('denoise_image', {'method': 'bilateral'}),
        ('denoise_image', {'method': 'nlm'}),
    ],
}

and update process_book_with_transformations to iterate over that list instead of transform_config.keys().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions