feat: Add privacy specific taxonomy #84

jajanet · 2025-10-13T17:11:34Z

As part of #47, this PR helps ensure P0 CUJ-1 (log data leak ID and removal) and P0 CUJ-2 (ID sensitive flow to 3P) is addressed in the security:analyze command

This also helps cover more privacy specific features via outputting a simple datamap with source and sinks that the end of the analysis

Pending more test cases, this is an example of what a run would look like with a small set of tests: https://screenshot.googleplex.com/8nuFzxWcS5V2X6b (computer settings won't let me paste or upload an image to GH for some reason)

In short, this mainly adds:

privacy taint analysis skill to make sure those issues are flagged (similar to security ones)
edits the following analysis fields:
- Location --> Source Location, to make the privacy datamap more clear
the following fields to the analysis:
- vulnerability type (to differentiate between privacy and security issues)
- sink (only for privacy issues, to complete the datamap)
- data type (only for privacy issues, to flag the specific PII)

google-cla · 2025-10-13T17:11:40Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

heltonduarte · 2025-10-14T18:58:34Z

commands/security/analyze.toml

    *   **Action:** Read the entire `DRAFT_SECURITY_REPORT.md` file.
    *   **Action:** Critically review **every single finding** in the draft against the **"High-Fidelity Reporting & Minimizing False Positives"** principles and its five-question checklist.
    *   **Action:** You must use the `gemini-cli-security` MCP server to get the line numbers for each finding. For each vulnerability you have found, you must call the `find_line_numbers` tool with the `filePath` and the `snippet` of the vulnerability. You will then add the `startLine` and `endLine` to the final report.
+    *   **Action:** After reviewing the detailed findings, you will synthesize all identified privacy violations into a summary table. This table must be included at the top of the final report under a `## Privacy Data Map` heading.


I think this pollutes the output too much without bringing extra value compared to the "vulnerability" it already surfaces. One idea is just to add source and sink to the summary of the privacy violation when generating the report.

Got it! I wasn't sure the best way to rectify this -- currently, I added fields to the Skillset: Reporting in GEMINI.md that are conditional on a vulnerability being privacy related along with a vulnerability type field

I guess the main question I have is: should the privacy and security issues commingle in the final report?

As of recent changes, they commingle -- for example, we could have a single report which lists a security issue, followed by a couple of privacy issues, which is followed by a security one: XSS, PII in Logs, PII to 3P, SSRF

Alternatively, we could be a separate security section and privacy section. Meaning, the Security section would have XSS, SSRF and Privacy would have PII in Logs, PII to 3P for the same example

Thoughts?

heltonduarte · 2025-10-14T19:00:49Z

commands/security/analyze.toml

-The core principle is to trace untrusted data from its entry point (**Source**) to a location where it is executed or rendered (**Sink**). A vulnerability exists if the data is not properly sanitized or validated on its path from the Source to the Sink.
+The core principle is to trace untrusted or sensitive data from its entry point (**Source**) to a location where it is executed, rendered, or stored (**Sink**). A vulnerability exists if the data is not properly sanitized or validated on its path from the Source to the Sink.
+
+### Extended Skillset: Privacy Taint Analysis


Have you considered merging this "Privacy Taint Analysis" into the current taxonomy of "Logging of Sensitive Information" and "PII Handling Violations" in Gemini.md?

Ah yes, that looks like a better spot to put it! Let me move it there!

… privacy fields where relevant

heltonduarte · 2025-10-20T21:09:36Z

GEMINI.md

 *   **Severity:** Critical, High, Medium, or Low.
-*   **Location:** The file path where the vulnerability was introduced and the line numbers if that is available.
+*   **Source Location:** The file path where the vulnerability was introduced and the line numbers if that is available.
+*   **Sink Location:** If this is a privacy issue, include this location where sensitive data is exposed or leaves the application's trust boundary


Nit: add a final period here.

shrishabh · 2025-10-23T21:59:50Z

GEMINI.md


 ---

+## Skillset: Privacy Taint Analysis


Since we are effectively expanding the taxonomy, would it be better to have this included as 1.7 in the section above? This is essentially insecure data handling category, I think? cc: @heltonduarte @capachino

Looking at this, I agree -- keeping it under a new 1.7 section would be better because of that and it would keep the tool as a single unified workflow!

…llset

capachino · 2025-11-05T21:32:54Z

GEMINI.md

        - Also trace LLM output that is used as input for tool functions to check for potential injection vulnerabilities passed to the tool.

+### 1.7. Privacy Violations
+* **Action:** Identify where sensitive data (PII/SPI) is exposed or leaves the application's trust boundary.


nit: can you fix the space to be consistent with the other sections, specficially the other sections have two spaces after the bullet point marker.

Fixed the 3 lines, thank you!

sorry I meant this new section 1.7 should follow the existing indent spacing of the other sections. It seems to differ?

capachino · 2025-11-05T21:33:29Z

GEMINI.md

   2. **Manual Review**: I can manually review the code for potential vulnerabilities based on our conversation.
 ```
 *   Explicitly ask the user which they would prefer before proceeding. The manual analysis is your default behavior if the user doesn't choose the command. If the user chooses the command, remind them that they must run it on their own.
-*   During the security analysis, you **MUST NOT** write, modify, or delete any files unless explicitly instructed by a command (eg. `/security:analyze`). Artifacts created during security analysis should be stored in a `.gemini_security/` directory in the user's workspace.


was the removal of this sentence intentional?

It wasn't! Thanks for pointing it out!

capachino

Looks good overall, a few more things before merging:

I believe you'll also need to update commands/security/analyze-github-pr.toml so maybe after everyone LGTM the changes to analyze.toml
Should update the repo README.md to reflect these new capabilities.

capachino · 2025-11-12T01:55:51Z

GEMINI.md

+    * **Privacy Taint Analysis:** Trace data from "Privacy Sources" to "Privacy Sinks." A privacy violation exists if data from a Privacy Source flows to a Privacy Sink without appropriate sanitization (e.g., masking, redaction, tokenization). Key terms include:
+        * **Privacy Sources** Locations that can be both untrusted external input or any variable that is likely to contain Personally Identifiable Information (PII) or Sensitive Personal Information (SPI). Look for variable names and data structures containing terms like: `email`, `password`, `ssn`, `firstName`, `lastName`, `address`, `phone`, `dob`, `creditCard`, `apiKey`, `token`
+        * **Privacy Sinks** Locations where sensitive data is exposed or leaves the application's trust boundary. Key sinks to look for include:
+            * **Logging Functions:** Any function that write unmasked sensitive data to a log file or console (e.g., `console.log`, `logging.info`, `logger.debug`).


nit: write -> writes

capachino · 2025-11-12T01:57:14Z

GEMINI.md

        - Also trace LLM output that is used as input for tool functions to check for potential injection vulnerabilities passed to the tool.

+### 1.7. Privacy Violations
+* **Action:** Identify where sensitive data (PII/SPI) is exposed or leaves the application's trust boundary.


sorry I meant this new section 1.7 should follow the existing indent spacing of the other sections. It seems to differ?

jajanet requested review from QuanZhang-William, capachino, evanotero, heltonduarte, pedrour and shrishabh as code owners October 13, 2025 17:11

add privacy specific taxonomy to security analyze command

c01365b

jajanet force-pushed the main branch from 1c60530 to c01365b Compare October 13, 2025 17:21

capachino changed the title ~~Add privacy specific taxonomy~~ feat: Add privacy specific taxonomy Oct 13, 2025

heltonduarte reviewed Oct 14, 2025

View reviewed changes

jajanet added 2 commits October 15, 2025 17:17

Relocate privacy skillset, remove datamap table in favor of additonal…

26b4986

… privacy fields where relevant

Extra space and some cleanup

60aa578

heltonduarte approved these changes Oct 20, 2025

View reviewed changes

Merge branch 'gemini-cli-extensions:main' into main

690c9e0

shrishabh reviewed Oct 23, 2025

View reviewed changes

jajanet added 2 commits October 27, 2025 10:25

add period

580ea8b

move and modify privacy violations check under sast vuln analysis ski…

b93b996

…llset

capachino reviewed Nov 5, 2025

View reviewed changes

Fix spacing and accidentally removed line per PR comment

8986f3d

capachino reviewed Nov 12, 2025

View reviewed changes

feat: Add privacy specific taxonomy #84

Are you sure you want to change the base?

feat: Add privacy specific taxonomy #84

Uh oh!

Conversation

jajanet commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-cla bot commented Oct 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

capachino left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jajanet commented Oct 13, 2025 •

edited

Loading