Skip to content

Conversation

@skywalke34
Copy link
Contributor

Summary

  • Expanded CycloneDX parser documentation from basic overview to detailed reference
  • Added comprehensive field mapping tables for both JSON and XML formats
  • Documented special processing for dates, severity, CVSS, and status conversion
  • Included notes on component lookup, deduplication, and legacy format support
    Authored by T. Walker - DefectDojo

Added detailed documentation for CycloneDX parser including:
  - Field mapping tables for both JSON and XML formats
  - Special processing notes for dates, severity, status conversion
  - CVSS processing details and component lookup mechanics
  - Deduplication and vulnerability ID collection explanations
  - Legacy format support and namespace handling (XML)

  Authored by T. Walker - DefectDojo
@github-actions github-actions bot added the docs label Nov 20, 2025
@mtesauro mtesauro requested a review from paulOsinski November 21, 2025 03:14
@valentijnscholten valentijnscholten added this to the 2.53.0 milestone Nov 21, 2025
@valentijnscholten
Copy link
Member

Thanks for the PR @skywalke34, looks like it needs a small bit of tuning to pass the unit tests that ensures consistency across parser docs:

2025-11-21T03:25:07.9237968Z uwsgi-1  | ======================================================================
2025-11-21T03:25:07.9238314Z uwsgi-1  | FAIL: test_file_existence (unittests.test_parsers.TestParsers.test_file_existence) (parser='cyclonedx', category='docs')
2025-11-21T03:25:07.9238540Z uwsgi-1  | ----------------------------------------------------------------------
2025-11-21T03:25:07.9238670Z uwsgi-1  | Traceback (most recent call last):
2025-11-21T03:25:07.9238885Z uwsgi-1  |   File "/app/unittests/test_parsers.py", line 46, in test_file_existence
2025-11-21T03:25:07.9239054Z uwsgi-1  |     self.assertRegex(content, "### Sample Scan Data",
2025-11-21T03:25:07.9239184Z uwsgi-1  |     ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-21T03:25:07.9239389Z uwsgi-1  |                     f"Documentation file '{doc_file}' does not contain ### Sample Scan Data",
2025-11-21T03:25:07.9239523Z uwsgi-1  |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-21T03:25:07.9239622Z uwsgi-1  |                     )
2025-11-21T03:25:07.9239713Z uwsgi-1  |                     ^

| serialNumber | - | N/A | BOM serial number, not used in findings |
| version | - | N/A | BOM version number, not used in findings |
| metadata.timestamp | date | 17-20 | Parsed and set as finding date if present |
| components | - | 21, 148-156 | Flattened into dictionary for lookup by bom-ref |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the conclusion of earlier talks about this. But my view is that adding (and maintaining) line numbers here is too fine grained. It adds work to anyone making changes to the parser and it adds work to us as reviewers to double check any changes. I think the line numbers are only useful for people who can read source code and once they can, they'll be able to find the code sections easily even if we don't specify the line numbers in the docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the above, would create an ongoing docs maintenance requirement and would have limited relevance to a viewer in this specific context ^ FYI @skywalke34

Comment on lines 197 to 209

**Implementation:**
```python
# JSON: json_parser.py lines 17-20
report_date = None
if "metadata" in data and "timestamp" in data["metadata"]:
report_date = dateutil.parser.parse(data["metadata"]["timestamp"])

# XML: xml_parser.py lines 31-34
report_date = tree.find("b:metadata/b:timestamp", ns)
if report_date is not None:
report_date = dateutil.parser.parse(report_date.text)
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to leave this out.


**JSON Format:** The parser sets `vuln_id_from_tool` (line 78) which is used by DefectDojo's deduplication logic.

**XML Format:** The parser sets `vuln_id_from_tool` (lines 137, 224) which may be used by DefectDojo's deduplication algorithm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readers may wonder what the "... may be used ..." means?

Comment on lines 54 to 58
### Total Fields in JSON

- Total data fields: 45
- Total data fields parsed: 20
- Total data fields NOT parsed: 25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it looks nice, but does it provide real value? My first instinct is we should leave it out.

CycloneDX is a lightweight software bill of materials (SBOM) standard designed for use in application security contexts and supply chain component analysis.

From: https://www.cyclonedx.org/
# CycloneDX Parser Documentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line, or a title of any kind, is not required as we have the parser title in the front matter already (see line 2)
fyi @skywalke34 I am going to take this out

| vulnerabilities[].ratings[].source.name | - | N/A | Rating source name not mapped |
| vulnerabilities[].ratings[].source.url | - | N/A | Rating source URL not mapped |
| vulnerabilities[].ratings[].score | cvssv3_score | 91-99 | Extracted from CVSSv3 vector calculation |
| vulnerabilities[].ratings[].severity | severity | 36-38, 95-97 | Fixed via fix_severity helper, overridden by CVSS calculation if available |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not consistent with what the parser says - in the case of CycloneDX we don't always override with CVSS. Instead CVSS is used as a fallback if Severity is not present in the file.

@paulOsinski
Copy link
Contributor

here would be my feedback if you want to apply more of these template changes:

  1. ### Sample Scan Data needs to be the heading format of that particular entry (this file changed the heading indent to be ##), as this is what our unit tests match for in new parser documentation.
  2. Please leave out the title (# CycloneDX Parser Documentation)
  3. I would avoid any explicit references to source code. Detailing the end result is fine with me but line numbers, references to python functions, or direct code blocks should be avoided.
  4. Deduplication behavior is already accounted for with default hash codes, no need to include that information again.

There's at least one error in this PR where the language misunderstands our source code, see my comment for an example. I don't think we can merge this without checking each mapping detail.

Copy link
Contributor

@paulOsinski paulOsinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants