Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer XML string cleaning to improve performance #1511

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

eliasbenb
Copy link

Description

This PR optimizes XML parsing by deferring the use of utils.cleanXMLString, which contains an expensive regex query. Instead of always cleaning the string before parsing, we now attempt to parse the XML string in its raw form, only falling back to cleaning if a ParseError occurs.

Originally suggested in #1510 (comment).

Type of change

None of the types of changes really apply, this is a performance enhancement

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated the docstring for new or existing methods
  • I have added tests when applicable

@JonnyWong16
Copy link
Collaborator

Replace this with the helper function as well?

return ElementTree.fromstring(data) if data.strip() else None

Copy link
Collaborator

@JonnyWong16 JonnyWong16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also change import xml to from xml.etree import ElementTree in media.py.

And update this exception:

except xml.etree.ElementTree.ParseError:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants