Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .claude/skills/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@
!test/**
!btrace-perfetto/
!btrace-perfetto/**
!check-code-attribution/
!check-code-attribution/**
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Third-Party Code Attribution

When adapting code from third-party libraries:

Comment thread
markushi marked this conversation as resolved.
1. Add a license header at the top of the adapted file (before the `package` statement):
```java
// Adapted from <Library Name>.
// Copyright <year> <copyright holder>.
// Licensed under the <License Name>.
// <source URL>
```

2. Add a full attribution entry to `THIRD_PARTY_NOTICES.md` following the existing format (Source, License, Copyright, Scope, full license text)
135 changes: 135 additions & 0 deletions .claude/skills/check-code-attribution/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
---
name: check-code-attribution
description: Check vendored code attributions in branch diff and flag any that are deficient. Use when asked to "check attribution", "check licenses", "verify vendored code attribution", or "check code attribution".
allowed-tools: Bash, Read
---

# Check Code Attribution

Verify that vendored/adapted third-party code in the current branch diff has correct license headers and `THIRD_PARTY_NOTICES.md` entries.

## Step 1: Run the Candidate Detection Script

Run the pre-filter script:

```bash
bash .claude/skills/check-code-attribution/find-attribution-candidates.sh
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have kept this one a bit simpler: Do a git diff and leave all output interpretation to the agent. But of course, this would also put all trust into the model. How about a hybrid: Do the git diff and the check for common patterns as part of the script, but leave all output interpretation to the model?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are only running on a git diff in PR and not the entire codebase why not just use an to check the code instead of using this script?
Or add something to warden to check for attribution candidates?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to defer to folks here.

My original approach used a script less to keep judgment out of the LLMs hands, and more to keep costs down. (We want to run this skill for every sentry-java CI build, very few of which will actually touch license headers or THIRD_PARTY_NOTICES.)

I still think those cost-savings are worth the extra effort of creating our own Github actions workflow rather than going with Warden (which doesn't permit scripts but would otherwise be a great choice). I've posted a version that applies @markushi's "hybrid" approach; feel free to chime in if you have a strong view about the Warden route.

Copy link
Copy Markdown
Contributor

@runningcode runningcode May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not bash expert so maybe it just my bias here. LLMs obviously help a lot but 1400 lines of bash is quite a maintenance burden.

I executed the script locally to test and found that it would match things like CHANGELOG.md and gradlew but not io.sentry.android.core.ApplicationNotResponding.java. That's the opposite of what we want.

If just ask claude Can you check the codebase to find any third party code that isn't attributed in the @THIRD_PARTY_NOTICES.md? I get a more accurate answer which includes the above as well as some other pieces of our codebase that are not attributed correctly.
That is why I think running an LLM on the diff with that question is a more accurate and more easily maintainable solution. We already run warden, seer, bugbot and cursor on each PR. We can just add a rule to one of these that checks this.

That's just my 2cents. Feel free to merge this though.

```

Stdout **always** starts with **global metadata** (first two lines):

```
notices_file_exists: true|false
notices_file_changed: true|false
```

If the script exits with any code besides 0 or 10, it failed (e.g., could not determine merge-base). Print the stderr output and **stop**.

If the script exits with code **0**, there are no file candidates **and** `THIRD_PARTY_NOTICES.md` is unchanged vs the merge-base. Print "โœ… No attribution issues found." and **stop**.

If the script exits with code **10**, there is at least one file candidate **and/or** `THIRD_PARTY_NOTICES.md` changed (including NOTICES-only edits that produce **zero** candidate blocks). After the two metadata lines, stdout may contain zero or more **candidate blocks**:

```
---
file: <path>
status: A|M|D|R (i.e., "added", "modified", "deleted", "renamed")
reasons: <comma-separated list of why this file was flagged>
---
```

The script handles candidate identification deterministically โ€” including committed, staged, and unstaged changes โ€” so trust its output. Do not dismiss a candidate as a false positive based on the committed diff alone.

Parse the metadata and any candidate blocks, then proceed to Step 2. If there are **zero** candidate blocks but `notices_file_changed` is `true`, skip Step 3 and still run Step 4.

## Step 2: Gather Context

Read `CODE_ATTRIBUTION_CRITERIA.md` (in this skill's directory) for the canonical attribution format.

If `notices_file_exists` is `true`, read `THIRD_PARTY_NOTICES.md` to understand existing entries.

## Step 3: Analyze Each Candidate

**Skip analysis for deleted files** (`status: D`) โ€” they only need a ๐Ÿ‘€ verify finding.

For each non-deleted candidate:

1. **Read the file** and check for the required attribution fields from `CODE_ATTRIBUTION_CRITERIA.md`: The criteria shows a canonical template, but the exact wording, comment style, and formatting don't need to match exactly. Only flag **missing** fields. For candidates whose reasons include "removed", also read the merge-base version (`git show "$MB:<path>"`) to see what attribution was there before โ€” you need both versions to determine whether attribution was stripped vs. never present.

2. **Match to a `THIRD_PARTY_NOTICES.md` entry** โ€” Try to find a corresponding entry by URL, library name, copyright holder, or other context. Record whether you found a match, and if so, which entry.

3. **Check license compatibility** โ€” Identify the license in the file's header and classify it. sentry-java is MIT-licensed. Sentry's Open Source Legal Policy (https://open.sentry.io/licensing/) defines four tiers:
- **Permissive** (MIT, BSD, Apache 2.0, ISC, CC-BY, CC0, Unlicense, WTFPL, Zlib, etc.) โ€” allowed. No action needed.
- **Weak copyleft** (LGPL, MPL, EPL, CDDL, CPL, etc.) โ€” may be allowed for vendoring but requires review. Flag as **Critical** with a note to verify against the policy.
- **Strong copyleft** (GPL, QPL, Sleepycat, OSL, etc.) โ€” flag as **Critical**, requires legal review before vendoring.
- **AGPL** โ€” **absolute ban**, must not be used at Sentry for any use case. Flag as **Critical** and block.
- **No license** โ€” assume no permission to use. Flag as **Critical**.

Also check whether this license type is already represented in `THIRD_PARTY_NOTICES.md` headings; if it's new, note it.

## Step 4: Check for NOTICES Entry Changes

If `notices_file_changed` is `true`, compare the merge-base revision of `THIRD_PARTY_NOTICES.md` to the current file. Resolve merge-base the same way as `find-attribution-candidates.sh` (`origin/main`, else `main`):

```bash
MB=$(git merge-base HEAD origin/main 2>/dev/null || git merge-base HEAD main)
```

- **Old (merge-base) content:** `git show "$MB:THIRD_PARTY_NOTICES.md"` โ€” if this fails (file absent at merge-base), treat the old side as empty.
- **New (current) content:** read `THIRD_PARTY_NOTICES.md` from the repo root when `notices_file_exists` is `true`. When `notices_file_exists` is `false`, the file is gone from the worktree (for example deleted on the branch); treat the new side as empty so every merge-base entry shows up as removed for analysis.

Compare old vs. new and verify that every `THIRD_PARTY_NOTICES.md` entry is consistent with the source file headers from the diff: metadata matches, no orphaned or missing entries, no stale Scope paths. Skip entries whose source files were already analyzed in Step 3 โ€” they're covered there. Any entries with new license types (e.g., AGPL where no other entry has an AGPL license) must be flagged as **Critical**.

**Also check even when `notices_file_changed` is `false`:** if the branch deletes or renames a source file (status D or R), verify that the corresponding NOTICES entry was updated or removed. This catches the case where NOTICES *should* have changed but didn't.

## Step 5: Output Results

If there are no issues, print:

```
โœ… No attribution issues found.
```

Otherwise, print findings as a numbered list. Use fully qualified class names (e.g., `io.sentry.cache.tape.FileObjectQueue`). Guidelines:

- **๐Ÿšจ** = license issue (AGPL, strong copyleft, weak copyleft, new license type, unlicensed code). Goes in the **Critical** section.
- **โš ๏ธ** = must fix before merging (missing fields, stripped attribution, inconsistent or orphaned NOTICES entries). Goes in the **Urgent** section.
- **๐Ÿ‘€** = author should verify (deleted/renamed files, matched NOTICES entries, consistent NOTICES modifications). Goes in the **Verify** section.
- Keep license-header issues and `THIRD_PARTY_NOTICES.md` issues in separate bullets.
- For license concerns, link the policy: https://open.sentry.io/licensing/
- Be **very, very** concise โ€” say what's wrong and what to do in as few words as possible!
- If any candidates are false positives, list them at the end with a one-line reason each.
- Separate each numbered entry with an empty line for readability (see example "Urgent" output below).
- Omit any section that has no entries.

Example output:

```
Code Attribution Check
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Critical
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. ๐Ÿšจ io.sentry.util.AgplHelper
AGPL-licensed code โ€” absolute ban per Sentry policy. Must be removed.
- Policy: https://open.sentry.io/licensing/

Urgent
โ”€โ”€โ”€โ”€โ”€โ”€
2. โš ๏ธ io.sentry.util.TokenBucket
Vendored code (Guava) โ€” header is missing the source URL and copyright
year.
- No corresponding `THIRD_PARTY_NOTICES.md` entry; add one.

3. โš ๏ธ io.sentry.android.core.ANRWatchDog
MIT license header was stripped. Restore the attribution header.

Verify
โ”€โ”€โ”€โ”€โ”€โ”€
4. ๐Ÿ‘€ io.sentry.cache.tape.FileObjectQueue
Vendored code (Square Tape) โ€” verify `THIRD_PARTY_NOTICES.md` reflects
your updates.

False positives
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
a. AGENTS.md โ€” project documentation, not vendored code.
```
Loading
Loading