Skip to content

Ability to handle multipart/related and other multipart trees & HTML if no text available#35

Open
foulkesj wants to merge 2 commits intoasweigart:masterfrom
foulkesj:master
Open

Ability to handle multipart/related and other multipart trees & HTML if no text available#35
foulkesj wants to merge 2 commits intoasweigart:masterfrom
foulkesj:master

Conversation

@foulkesj
Copy link

@foulkesj foulkesj commented Sep 3, 2021

Commit addresses more complex Mimetype structures.

When using the current code across a Gmail inbox (e.g. find all and save to file) there are cases where ezgmail fails to find body/original body (as it is not in text/plain or multitype/alternative:text/plain).
This results in body returning the default empty list, and original body not being created, causing error when calling in outside programs.

Other mimeTypes I have seen used include multipart/related or multipart/mixed, although more complex ones may be present:
See here for descriptions:
https://techcommunity.microsoft.com/t5/exchange-team-blog/mixed-ing-it-up-multipart-mixed-messages-and-you/ba-p/585841
https://stackoverflow.com/questions/3902455/mail-multipart-alternative-vs-multipart-mixed

This version largely retains the original code for finding text/plain and multitype/alternative.
However it now uses a while loop to travel through a multi-type tree until multitype/alternative is found. (If multitype is found in mimeType it goes a level deeper)
It will still not find a body if no multitype/alternative exists.

If no html/plain is found at the top level, or there is not text.plain in a multitype/alternative, it will will return an HTML version.

Due to the increased number of locations where body/original body can be found the encode and assign body code has been placed in a separate function (getEncodingAndOriginalBody).
getEncodingAndOriginalBody accesses the encoding (from the headers) and add the originalBody and Body to self, replicating the actions of the original code.
[ from for header in multipartPart["headers"]:
to
self.body = removeQuotedParts(self.originalBody)]

At a future stage could collect both html and plain text, if present, and return separately.

Commit addresses more complex Mimetype structures. 
In current code there are fails to find body/original body. This results in body returning the default [], and original body not being created, causing error when calling.

In particular original code fails to find body where multipart/related or multipart/mixed are present in the mimetype tree
See here for various descriptions: https://techcommunity.microsoft.com/t5/exchange-team-blog/mixed-ing-it-up-multipart-mixed-messages-and-you/ba-p/585841

Commit now finds text/plain inside a multitype/alternative it more places in the tree. If no text/plain or multitype is found will return an HTML version.

Additional while loop used to travel through multi-type tree until multitype/alternative is found.

Due to the increase number of locations where body/original body can be found a separate function has been added to get the encoding (from the headers) and add the originalBody and Body to self.
Address Typos and location of function
@awebmekhilef
Copy link

Hi @asweigart, any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants