Skip to content

Conversation

@maraspr
Copy link
Contributor

@maraspr maraspr commented Jan 7, 2026

Default behaviour for b64decode is to discard all characters that are not in the the base64 alphabet before doing a padding check. If, by chance, the remaining characters pass this check, the input is processed without error. This may cause non-base64 input to be processed as base64 by mistake.
See documentation here.

The following xml snippet will fail to be parsed if it is compressed before parsing with parsedmarc test.xml.gz:

<?xml version="1.0" encoding="UTF-8" ?>
<feedback>
  <version>1.0</version>
  <report_metadata>
    <org_name>xxxxxx.xx</org_name>
    <email>[email protected]</email>
    <extra_contact_info>[email protected]</extra_contact_info>
    <report_id>[email protected]</report_id>
    <date_range>
      <begin>1111111111</begin>
      <end>1111111111</end>
    </date_range>
  </report_metadata>
  <policy_published>
    <domain>xxx.xxxxxx.xx</domain>
    <adkim>r</adkim>
    <aspf>r</aspf>
    <p>reject</p>
    <sp></sp>
    <pct>100</pct>
  </policy_published>
  <record>
    <row>
      <source_ip>10.100.10.1</source_ip>
      <count>1</count>
      <policy_evaluated>
        <disposition>none</disposition>
        <dkim>pass</dkim>
        <spf>pass</spf>
      </policy_evaluated>
    </row>
    <identifiers>
      <header_from>xxx.xxxxxx.xx</header_from>
      <envelope_from></envelope_from>
    </identifiers>
    <auth_results>
      <dkim>
        <domain>xxx.xxxxxx.xx</domain>
        <selector>x1</selector>
        <result>pass</result>
      </dkim>
      <spf>
        <domain>xx1.xx.xxx.xx</domain>
        <scope>helo</scope>
        <result>none</result>
      </spf>
    </auth_results>
  </record>
</feedback>

with the following error message:

ERROR:cli.py:1580:Failed to parse test.xml.gz - not a valid report.

This pull-request should fix this issue by setting validate=True in b64decode where applicable.

@seanthegeek seanthegeek merged commit 792079a into domainaware:master Jan 8, 2026
0 of 5 checks passed
@maraspr
Copy link
Contributor Author

maraspr commented Jan 8, 2026

Sorry about not catching that test.

Seems to be a problem with newlines appearing in the content variable. Not a super elegant solution, but it works to just remove the newlines before b64decode. I commited the following fix for that to my fork. Should I create a new pull-request?

diff --git a/parsedmarc/__init__.py b/parsedmarc/__init__.py
index cf8197c..60c4be7 100644
--- a/parsedmarc/__init__.py
+++ b/parsedmarc/__init__.py
@@ -892,7 +892,11 @@ def extract_report(content: Union[bytes, str, BinaryIO]) -> str:
     try:
         if isinstance(content, str):
             try:
-                file_object = BytesIO(b64decode(content, validate=True))
+                file_object = BytesIO(
+                    b64decode(
+                        content.replace("\n", "").replace("\r", ""), validate=True
+                    )
+                )
             except binascii.Error:
                 return content
             header = file_object.read(6)

@seanthegeek
Copy link
Contributor

Please do

@maraspr
Copy link
Contributor Author

maraspr commented Jan 8, 2026

#649

seanthegeek added a commit that referenced this pull request Jan 8, 2026
Validate that a string is base64-encoded before trying to base64 decode it. (PRs #648 and #649)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants