Skip to content

fetchFileContent corrupts binary files by decoding base64 as UTF-8 #482

@pedropaulovc

Description

@pedropaulovc

File: action/src/capture/file-fetcher.ts lines 28-48

Summary: fetchFileContent base64-decodes every file returned by repos.getContent as UTF-8, but GitHub returns encoding='base64' for ALL files under 1MB (text and binary). The encoding !== 'base64' guard only filters files >=1MB, not binaries. Any PR touching a small binary (PNG, font, zip, sqlite, etc.) ends up with mojibake (U+FFFD replacement chars) stored in content_blobs and fed through computeLineDiff. The isBinaryFile() helper in the same file (lines 53-66) exists but is never called.

Fix direction: call isBinaryFile(path) at top of fetchFileContent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions