Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lxml guidance is not useful #767

Open
mwichmann opened this issue Jan 3, 2022 · 9 comments · May be fixed by #1212
Open

lxml guidance is not useful #767

mwichmann opened this issue Jan 3, 2022 · 9 comments · May be fixed by #1212
Labels
bug Something isn't working

Comments

@mwichmann
Copy link

mwichmann commented Jan 3, 2022

Describe the bug

Just so this is recorded somewhere, it's certainly not a show-stopper bug:

If your program uses lxml, it's going to get warnings like this:

>> Issue: [B410:blacklist] Using lxml to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml with the equivalent defusedxml package.

Except that the defusedxml.lxml package was never really real, it was intended as an example, and to stop people from using it as-is it's now deprecated and planned for removal. Probably bandit should not be suggesting that? The problem is - what else would one suggest?

See: https://pypi.org/project/defusedxml/#defusedxml-lxml

Reproduction steps

Run bandit on something that uses lxml.

Expected behavior

Expect: "a useful suggestion". Possibly there's no good suggestion to make here?

Bandit version

1.7.0 (Default)

Python version

3.9 (Default)

Additional context

No response

@mwichmann mwichmann added the bug Something isn't working label Jan 3, 2022
@jpodivin
Copy link

I suppose lxml could have been brought into compliance in the meantime?
Have you asked the maintainer ? https://github.com/scoder

@djbrown
Copy link

djbrown commented Aug 20, 2022

#435 was falsely closed as invalid some time ago..
defusedxml.lxml is deprecated.
bandit should make smarter checks or at least stop suggesting defusedxml.lxml

AFAIK lxml actually provides secure parsing by now, users just have to apply it correctly.

@djbrown
Copy link

djbrown commented Aug 20, 2022

also #261 was closed as completed - but it isn't
and #716 seems to be a more specific duplicate of this one.
I suggest discussing in this issue, since it's about lxml as a whole..
@ericwb can you confirm this is a valid issue after all?

@djbrown
Copy link

djbrown commented Aug 20, 2022

Security assessment of lxml as of https://github.com/tiran/defusedxml#python-xml-libraries:

  • Lxml is protected against billion laughs attacks and doesn't do network lookups by default.
  • libxml2 and lxml are not directly vulnerable to gzip decompression bombs but they don't protect you against them either.
  • Library has (limited) XInclude support but requires an additional step to process inclusion.

Bandit could check for insecure use of lxml and suggest using lxml safely

rpatterson added a commit to rpatterson/feed-archiver that referenced this issue Feb 22, 2023
Bandit only pointed it out and [wasn't particularly
helpful](PyCQA/bandit#767 (comment)).
@djbrown
Copy link

djbrown commented Oct 30, 2023

Recently defusedxml got an update on safety information for lxml:

defusedxml.lxml

DEPRECATED The module is deprecated and will be removed in a future release.

lxml is safe against most attack scenarios. lxml uses libxml2 for parsing XML. The library has builtin mitigations against billion laughs and quadratic blowup attacks. The parser allows a limit amount of entity expansions, then fails. lxml also disables network access by default. libxml2 lxml FAQ lists additional recommendations for safe parsing, for example counter measures against compression bombs.

see tiran/defusedxml#38 and tiran/defusedxml#98
I'm thinking about creating a pull request, removing hints to defusedxml.lxml from bandit. @tiran would you endorse that?

@nicostubi
Copy link

Hello @djbrown,

Indeed, lxml has been deprecated within the package defusedxml. It would be useful to remove references to defusedxml.xml and give some guidance. There is still extra care to make when using lxml, isn't it?

Both lxml and defusedxml do a good job at describing precautionary steps to put in place, e.g.

So, Bandit could simply reference these documentations.

Your thoughts?

BR
Nicolas

@djbrown
Copy link

djbrown commented Dec 19, 2024

I was about to start working on a pull request for this, when I saw the cross posted astral-sh/ruff#13707 (removal of suspicious-xmle-tree-usage (S320)), so I dug a bit deeper on this topic 🕵🏾‍♀️

ruff already removed their bandit-inherited rule suspicious-lxml-import (S410) (see issue and pull request)

from that issue I found out about a controversial PSF discussion about wether to keep, replace or drop the defusedxml reference on the official python 1.12 xml docs. there were many further links and arguments, even to bandit itself, and others going in circles (defusedxml refers to lxml, lxml refers to defusedxml), and even ideas of extending the stlib. in the end they decided to keep it as it is, but there were strong arguments to replace the reference with lxml or at least remove it.

nonetheless, defusedxml.lxml is deprecated and people should switch to lxml directly as already statet in the defusedxml readme and in threads dating back to 2019 tiran/defusedxml#38 (comment) and 2018 tiran/defusedxml#25 (comment) and probably further.

I think the PSF should definitely update the docs to primarily refer to lxml and secondarily to defusedxml for different libs.
but that's not what this issue is about, but it may be the first step of cleaning up that mess of securely parsing xml with python

@djbrown djbrown linked a pull request Dec 19, 2024 that will close this issue
@djbrown
Copy link

djbrown commented Dec 19, 2024

I have created #1212 as the most straight forward way to resolve this issue

@djbrown
Copy link

djbrown commented Dec 19, 2024

@nicostubi you suggested simply changing the message to refer to the safe docs instead of just using defusedxml.lxml.
I think that would keep generating flase positives, as bandit currently does.
an even better solution would be to introduce a more sophisticated ast analysis, so actually vulnerable instances would be identified and reported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants