-
Notifications
You must be signed in to change notification settings - Fork 17
feat: Add privacy specific taxonomy #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
commands/security/analyze.toml
Outdated
| * **Action:** Read the entire `DRAFT_SECURITY_REPORT.md` file. | ||
| * **Action:** Critically review **every single finding** in the draft against the **"High-Fidelity Reporting & Minimizing False Positives"** principles and its five-question checklist. | ||
| * **Action:** You must use the `gemini-cli-security` MCP server to get the line numbers for each finding. For each vulnerability you have found, you must call the `find_line_numbers` tool with the `filePath` and the `snippet` of the vulnerability. You will then add the `startLine` and `endLine` to the final report. | ||
| * **Action:** After reviewing the detailed findings, you will synthesize all identified privacy violations into a summary table. This table must be included at the top of the final report under a `## Privacy Data Map` heading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this pollutes the output too much without bringing extra value compared to the "vulnerability" it already surfaces. One idea is just to add source and sink to the summary of the privacy violation when generating the report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! I wasn't sure the best way to rectify this -- currently, I added fields to the Skillset: Reporting in GEMINI.md that are conditional on a vulnerability being privacy related along with a vulnerability type field
I guess the main question I have is: should the privacy and security issues commingle in the final report?
As of recent changes, they commingle -- for example, we could have a single report which lists a security issue, followed by a couple of privacy issues, which is followed by a security one: XSS, PII in Logs, PII to 3P, SSRF
Alternatively, we could be a separate security section and privacy section. Meaning, the Security section would have XSS, SSRF and Privacy would have PII in Logs, PII to 3P for the same example
Thoughts?
commands/security/analyze.toml
Outdated
| The core principle is to trace untrusted data from its entry point (**Source**) to a location where it is executed or rendered (**Sink**). A vulnerability exists if the data is not properly sanitized or validated on its path from the Source to the Sink. | ||
| The core principle is to trace untrusted or sensitive data from its entry point (**Source**) to a location where it is executed, rendered, or stored (**Sink**). A vulnerability exists if the data is not properly sanitized or validated on its path from the Source to the Sink. | ||
| ### Extended Skillset: Privacy Taint Analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered merging this "Privacy Taint Analysis" into the current taxonomy of "Logging of Sensitive Information" and "PII Handling Violations" in Gemini.md?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that looks like a better spot to put it! Let me move it there!
… privacy fields where relevant
| * **Severity:** Critical, High, Medium, or Low. | ||
| * **Location:** The file path where the vulnerability was introduced and the line numbers if that is available. | ||
| * **Source Location:** The file path where the vulnerability was introduced and the line numbers if that is available. | ||
| * **Sink Location:** If this is a privacy issue, include this location where sensitive data is exposed or leaves the application's trust boundary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: add a final period here.
GEMINI.md
Outdated
| --- | ||
| ## Skillset: Privacy Taint Analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are effectively expanding the taxonomy, would it be better to have this included as 1.7 in the section above? This is essentially insecure data handling category, I think? cc: @heltonduarte @capachino
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, I agree -- keeping it under a new 1.7 section would be better because of that and it would keep the tool as a single unified workflow!
| - Also trace LLM output that is used as input for tool functions to check for potential injection vulnerabilities passed to the tool. | ||
| ### 1.7. Privacy Violations | ||
| * **Action:** Identify where sensitive data (PII/SPI) is exposed or leaves the application's trust boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you fix the space to be consistent with the other sections, specficially the other sections have two spaces after the bullet point marker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the 3 lines, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I meant this new section 1.7 should follow the existing indent spacing of the other sections. It seems to differ?
| 2. **Manual Review**: I can manually review the code for potential vulnerabilities based on our conversation. | ||
| ``` | ||
| * Explicitly ask the user which they would prefer before proceeding. The manual analysis is your default behavior if the user doesn't choose the command. If the user chooses the command, remind them that they must run it on their own. | ||
| * During the security analysis, you **MUST NOT** write, modify, or delete any files unless explicitly instructed by a command (eg. `/security:analyze`). Artifacts created during security analysis should be stored in a `.gemini_security/` directory in the user's workspace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was the removal of this sentence intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't! Thanks for pointing it out!
capachino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, a few more things before merging:
- I believe you'll also need to update
commands/security/analyze-github-pr.tomlso maybe after everyone LGTM the changes toanalyze.toml - Should update the repo README.md to reflect these new capabilities.
| * **Privacy Taint Analysis:** Trace data from "Privacy Sources" to "Privacy Sinks." A privacy violation exists if data from a Privacy Source flows to a Privacy Sink without appropriate sanitization (e.g., masking, redaction, tokenization). Key terms include: | ||
| * **Privacy Sources** Locations that can be both untrusted external input or any variable that is likely to contain Personally Identifiable Information (PII) or Sensitive Personal Information (SPI). Look for variable names and data structures containing terms like: `email`, `password`, `ssn`, `firstName`, `lastName`, `address`, `phone`, `dob`, `creditCard`, `apiKey`, `token` | ||
| * **Privacy Sinks** Locations where sensitive data is exposed or leaves the application's trust boundary. Key sinks to look for include: | ||
| * **Logging Functions:** Any function that write unmasked sensitive data to a log file or console (e.g., `console.log`, `logging.info`, `logger.debug`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: write -> writes
| - Also trace LLM output that is used as input for tool functions to check for potential injection vulnerabilities passed to the tool. | ||
| ### 1.7. Privacy Violations | ||
| * **Action:** Identify where sensitive data (PII/SPI) is exposed or leaves the application's trust boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I meant this new section 1.7 should follow the existing indent spacing of the other sections. It seems to differ?
As part of #47, this PR helps ensure P0 CUJ-1 (log data leak ID and removal) and P0 CUJ-2 (ID sensitive flow to 3P) is addressed in the
security:analyzecommandThis also helps cover more privacy specific features via outputting a simple datamap with source and sinks that the end of the analysis
Pending more test cases, this is an example of what a run would look like with a small set of tests: https://screenshot.googleplex.com/8nuFzxWcS5V2X6b (computer settings won't let me paste or upload an image to GH for some reason)
In short, this mainly adds: