docs: Expand CycloneDX parser documentation with detailed field mappings #13750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

skywalke34 wants to merge 5 commits into DefectDojo:dev from skywalke34:docs-update-cyclonedx-20251120

+311 −18

Contributor

skywalke34 commented Nov 20, 2025

Summary

Expanded CycloneDX parser documentation from basic overview to detailed reference
Added comprehensive field mapping tables for both JSON and XML formats
Documented special processing for dates, severity, CVSS, and status conversion
Included notes on component lookup, deduplication, and legacy format support
Authored by T. Walker - DefectDojo


          docs: Expand CycloneDX parser documentation with detailed field mappings

d402a6b

Added detailed documentation for CycloneDX parser including:
  - Field mapping tables for both JSON and XML formats
  - Special processing notes for dates, severity, status conversion
  - CVSS processing details and component lookup mechanics
  - Deduplication and vulnerability ID collection explanations
  - Legacy format support and namespace handling (XML)

  Authored by T. Walker - DefectDojo

skywalke34 requested review from Maffooch and mtesauro as code owners

November 20, 2025 23:02

github-actions bot added the docs label

mtesauro requested a review from paulOsinski

November 21, 2025 03:14

valentijnscholten added this to the 2.53.0 milestone

Member

valentijnscholten commented Nov 22, 2025

Thanks for the PR @skywalke34, looks like it needs a small bit of tuning to pass the unit tests that ensures consistency across parser docs:

2025-11-21T03:25:07.9237968Z uwsgi-1  | ======================================================================
2025-11-21T03:25:07.9238314Z uwsgi-1  | FAIL: test_file_existence (unittests.test_parsers.TestParsers.test_file_existence) (parser='cyclonedx', category='docs')
2025-11-21T03:25:07.9238540Z uwsgi-1  | ----------------------------------------------------------------------
2025-11-21T03:25:07.9238670Z uwsgi-1  | Traceback (most recent call last):
2025-11-21T03:25:07.9238885Z uwsgi-1  |   File "/app/unittests/test_parsers.py", line 46, in test_file_existence
2025-11-21T03:25:07.9239054Z uwsgi-1  |     self.assertRegex(content, "### Sample Scan Data",
2025-11-21T03:25:07.9239184Z uwsgi-1  |     ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-21T03:25:07.9239389Z uwsgi-1  |                     f"Documentation file '{doc_file}' does not contain ### Sample Scan Data",
2025-11-21T03:25:07.9239523Z uwsgi-1  |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-21T03:25:07.9239622Z uwsgi-1  |                     )
2025-11-21T03:25:07.9239713Z uwsgi-1  |                     ^

valentijnscholten reviewed

View reviewed changes

docs/content/supported_tools/parsers/file/cyclonedx.md Outdated

    
              | serialNumber | - | N/A | BOM serial number, not used in findings |

              | version | - | N/A | BOM version number, not used in findings |

              | metadata.timestamp | date | 17-20 | Parsed and set as finding date if present |

              | components | - | 21, 148-156 | Flattened into dictionary for lookup by bom-ref |

Member

valentijnscholten Nov 22, 2025

I'm not sure about the conclusion of earlier talks about this. But my view is that adding (and maintaining) line numbers here is too fine grained. It adds work to anyone making changes to the parser and it adds work to us as reviewers to double check any changes. I think the line numbers are only useful for people who can read source code and once they can, they'll be able to find the code sections easily even if we don't specify the line numbers in the docs.

Contributor

paulOsinski Nov 24, 2025

I agree with the above, would create an ongoing docs maintenance requirement and would have limited relevance to a viewer in this specific context ^ FYI @skywalke34

valentijnscholten reviewed

View reviewed changes

docs/content/supported_tools/parsers/file/cyclonedx.md Outdated

Comment on lines 197 to 209

    
              **Implementation:**

              ```python

              # JSON: json_parser.py lines 17-20

              report_date = None

              if "metadata" in data and "timestamp" in data["metadata"]:

                  report_date = dateutil.parser.parse(data["metadata"]["timestamp"])

              # XML: xml_parser.py lines 31-34

              report_date = tree.find("b:metadata/b:timestamp", ns)

              if report_date is not None:

                  report_date = dateutil.parser.parse(report_date.text)

              ```

Member

valentijnscholten Nov 22, 2025

My preference would be to leave this out.

valentijnscholten reviewed

View reviewed changes

docs/content/supported_tools/parsers/file/cyclonedx.md Outdated

    
              **JSON Format:** The parser sets `vuln_id_from_tool` (line 78) which is used by DefectDojo's deduplication logic.

              **XML Format:** The parser sets `vuln_id_from_tool` (lines 137, 224) which may be used by DefectDojo's deduplication algorithm.

Member

valentijnscholten Nov 22, 2025

Readers may wonder what the "... may be used ..." means?

valentijnscholten reviewed

View reviewed changes

docs/content/supported_tools/parsers/file/cyclonedx.md Outdated

Comment on lines 54 to 58

    
              ### Total Fields in JSON

              - Total data fields: 45

              - Total data fields parsed: 20

              - Total data fields NOT parsed: 25

Member

valentijnscholten Nov 22, 2025

I know it looks nice, but does it provide real value? My first instinct is we should leave it out.

paulOsinski reviewed

View reviewed changes

docs/content/supported_tools/parsers/file/cyclonedx.md Outdated

    
              CycloneDX is a lightweight software bill of materials (SBOM) standard designed for use in application security contexts and supply chain component analysis.

              From: https://www.cyclonedx.org/

              # CycloneDX Parser Documentation

Contributor

paulOsinski Nov 24, 2025

this line, or a title of any kind, is not required as we have the parser title in the front matter already (see line 2)
fyi @skywalke34 I am going to take this out

paulOsinski added 4 commits

November 24, 2025 19:43


          Update cyclonedx.md

7529a4b


          remove title

a3d07a3


          Merge branch 'dev' into docs-update-cyclonedx-20251120

d9c50c7


          Update cyclonedx.md

8ad6942

paulOsinski reviewed

View reviewed changes

docs/content/supported_tools/parsers/file/cyclonedx.md Outdated

    
              | vulnerabilities[].ratings[].source.name | - | N/A | Rating source name not mapped |

              | vulnerabilities[].ratings[].source.url | - | N/A | Rating source URL not mapped |

              | vulnerabilities[].ratings[].score | cvssv3_score | 91-99 | Extracted from CVSSv3 vector calculation |

              | vulnerabilities[].ratings[].severity | severity | 36-38, 95-97 | Fixed via fix_severity helper, overridden by CVSS calculation if available |

Contributor

paulOsinski Nov 25, 2025

this is not consistent with what the parser says - in the case of CycloneDX we don't always override with CVSS. Instead CVSS is used as a fallback if Severity is not present in the file.

Contributor

paulOsinski commented Nov 25, 2025

here would be my feedback if you want to apply more of these template changes:

### Sample Scan Data needs to be the heading format of that particular entry (this file changed the heading indent to be ##), as this is what our unit tests match for in new parser documentation.
Please leave out the title (# CycloneDX Parser Documentation)
I would avoid any explicit references to source code. Detailing the end result is fine with me but line numbers, references to python functions, or direct code blocks should be avoided.
Deduplication behavior is already accounted for with default hash codes, no need to include that information again.

There's at least one error in this PR where the language misunderstands our source code, see my comment for an example. I don't think we can merge this without checking each mapping detail.

paulOsinski requested changes

View reviewed changes

Contributor

paulOsinski left a comment •

edited

Loading

See above comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

valentijnscholten valentijnscholten left review comments

paulOsinski paulOsinski requested changes

Maffooch Awaiting requested review from Maffooch Maffooch is a code owner

mtesauro Awaiting requested review from mtesauro mtesauro is a code owner

Requested changes must be addressed to merge this pull request.

Labels

docs