diff --git a/.claude/skills/.gitignore b/.claude/skills/.gitignore index 229f4495ee3..2dd55eba801 100644 --- a/.claude/skills/.gitignore +++ b/.claude/skills/.gitignore @@ -8,3 +8,5 @@ !test/** !btrace-perfetto/ !btrace-perfetto/** +!check-code-attribution/ +!check-code-attribution/** diff --git a/.claude/skills/check-code-attribution/CODE_ATTRIBUTION_CRITERIA.md b/.claude/skills/check-code-attribution/CODE_ATTRIBUTION_CRITERIA.md new file mode 100644 index 00000000000..6d5d1e57b0f --- /dev/null +++ b/.claude/skills/check-code-attribution/CODE_ATTRIBUTION_CRITERIA.md @@ -0,0 +1,13 @@ +# Third-Party Code Attribution + +When adapting code from third-party libraries: + +1. Add a license header at the top of the adapted file (before the `package` statement): + ```java + // Adapted from . + // Copyright . + // Licensed under the . + // + ``` + +2. Add a full attribution entry to `THIRD_PARTY_NOTICES.md` following the existing format (Source, License, Copyright, Scope, full license text) diff --git a/.claude/skills/check-code-attribution/SKILL.md b/.claude/skills/check-code-attribution/SKILL.md new file mode 100644 index 00000000000..1b511ef0afe --- /dev/null +++ b/.claude/skills/check-code-attribution/SKILL.md @@ -0,0 +1,135 @@ +--- +name: check-code-attribution +description: Check vendored code attributions in branch diff and flag any that are deficient. Use when asked to "check attribution", "check licenses", "verify vendored code attribution", or "check code attribution". +allowed-tools: Bash, Read +--- + +# Check Code Attribution + +Verify that vendored/adapted third-party code in the current branch diff has correct license headers and `THIRD_PARTY_NOTICES.md` entries. + +## Step 1: Run the Candidate Detection Script + +Run the pre-filter script: + +```bash +bash .claude/skills/check-code-attribution/find-attribution-candidates.sh +``` + +Stdout **always** starts with **global metadata** (first two lines): + +``` +notices_file_exists: true|false +notices_file_changed: true|false +``` + +If the script exits with any code besides 0 or 10, it failed (e.g., could not determine merge-base). Print the stderr output and **stop**. + +If the script exits with code **0**, there are no file candidates **and** `THIRD_PARTY_NOTICES.md` is unchanged vs the merge-base. Print "✅ No attribution issues found." and **stop**. + +If the script exits with code **10**, there is at least one file candidate **and/or** `THIRD_PARTY_NOTICES.md` changed (including NOTICES-only edits that produce **zero** candidate blocks). After the two metadata lines, stdout may contain zero or more **candidate blocks**: + +``` +--- +file: +status: A|M|D|R (i.e., "added", "modified", "deleted", "renamed") +reasons: +--- +``` + +The script handles candidate identification deterministically — including committed, staged, and unstaged changes — so trust its output. Do not dismiss a candidate as a false positive based on the committed diff alone. + +Parse the metadata and any candidate blocks, then proceed to Step 2. If there are **zero** candidate blocks but `notices_file_changed` is `true`, skip Step 3 and still run Step 4. + +## Step 2: Gather Context + +Read `CODE_ATTRIBUTION_CRITERIA.md` (in this skill's directory) for the canonical attribution format. + +If `notices_file_exists` is `true`, read `THIRD_PARTY_NOTICES.md` to understand existing entries. + +## Step 3: Analyze Each Candidate + +**Skip analysis for deleted files** (`status: D`) — they only need a 👀 verify finding. + +For each non-deleted candidate: + +1. **Read the file** and check for the required attribution fields from `CODE_ATTRIBUTION_CRITERIA.md`: The criteria shows a canonical template, but the exact wording, comment style, and formatting don't need to match exactly. Only flag **missing** fields. For candidates whose reasons include "removed", also read the merge-base version (`git show "$MB:"`) to see what attribution was there before — you need both versions to determine whether attribution was stripped vs. never present. + +2. **Match to a `THIRD_PARTY_NOTICES.md` entry** — Try to find a corresponding entry by URL, library name, copyright holder, or other context. Record whether you found a match, and if so, which entry. + +3. **Check license compatibility** — Identify the license in the file's header and classify it. sentry-java is MIT-licensed. Sentry's Open Source Legal Policy (https://open.sentry.io/licensing/) defines four tiers: + - **Permissive** (MIT, BSD, Apache 2.0, ISC, CC-BY, CC0, Unlicense, WTFPL, Zlib, etc.) — allowed. No action needed. + - **Weak copyleft** (LGPL, MPL, EPL, CDDL, CPL, etc.) — may be allowed for vendoring but requires review. Flag as **Critical** with a note to verify against the policy. + - **Strong copyleft** (GPL, QPL, Sleepycat, OSL, etc.) — flag as **Critical**, requires legal review before vendoring. + - **AGPL** — **absolute ban**, must not be used at Sentry for any use case. Flag as **Critical** and block. + - **No license** — assume no permission to use. Flag as **Critical**. + + Also check whether this license type is already represented in `THIRD_PARTY_NOTICES.md` headings; if it's new, note it. + +## Step 4: Check for NOTICES Entry Changes + +If `notices_file_changed` is `true`, compare the merge-base revision of `THIRD_PARTY_NOTICES.md` to the current file. Resolve merge-base the same way as `find-attribution-candidates.sh` (`origin/main`, else `main`): + +```bash +MB=$(git merge-base HEAD origin/main 2>/dev/null || git merge-base HEAD main) +``` + +- **Old (merge-base) content:** `git show "$MB:THIRD_PARTY_NOTICES.md"` — if this fails (file absent at merge-base), treat the old side as empty. +- **New (current) content:** read `THIRD_PARTY_NOTICES.md` from the repo root when `notices_file_exists` is `true`. When `notices_file_exists` is `false`, the file is gone from the worktree (for example deleted on the branch); treat the new side as empty so every merge-base entry shows up as removed for analysis. + +Compare old vs. new and verify that every `THIRD_PARTY_NOTICES.md` entry is consistent with the source file headers from the diff: metadata matches, no orphaned or missing entries, no stale Scope paths. Skip entries whose source files were already analyzed in Step 3 — they're covered there. Any entries with new license types (e.g., AGPL where no other entry has an AGPL license) must be flagged as **Critical**. + +**Also check even when `notices_file_changed` is `false`:** if the branch deletes or renames a source file (status D or R), verify that the corresponding NOTICES entry was updated or removed. This catches the case where NOTICES *should* have changed but didn't. + +## Step 5: Output Results + +If there are no issues, print: + +``` +✅ No attribution issues found. +``` + +Otherwise, print findings as a numbered list. Use fully qualified class names (e.g., `io.sentry.cache.tape.FileObjectQueue`). Guidelines: + +- **🚨** = license issue (AGPL, strong copyleft, weak copyleft, new license type, unlicensed code). Goes in the **Critical** section. +- **⚠️** = must fix before merging (missing fields, stripped attribution, inconsistent or orphaned NOTICES entries). Goes in the **Urgent** section. +- **👀** = author should verify (deleted/renamed files, matched NOTICES entries, consistent NOTICES modifications). Goes in the **Verify** section. +- Keep license-header issues and `THIRD_PARTY_NOTICES.md` issues in separate bullets. +- For license concerns, link the policy: https://open.sentry.io/licensing/ +- Be **very, very** concise — say what's wrong and what to do in as few words as possible! +- If any candidates are false positives, list them at the end with a one-line reason each. +- Separate each numbered entry with an empty line for readability (see example "Urgent" output below). +- Omit any section that has no entries. + +Example output: + +``` +Code Attribution Check +══════════════════════ + +Critical +──────── +1. 🚨 io.sentry.util.AgplHelper + AGPL-licensed code — absolute ban per Sentry policy. Must be removed. + - Policy: https://open.sentry.io/licensing/ + +Urgent +────── +2. ⚠️ io.sentry.util.TokenBucket + Vendored code (Guava) — header is missing the source URL and copyright + year. + - No corresponding `THIRD_PARTY_NOTICES.md` entry; add one. + +3. ⚠️ io.sentry.android.core.ANRWatchDog + MIT license header was stripped. Restore the attribution header. + +Verify +────── +4. 👀 io.sentry.cache.tape.FileObjectQueue + Vendored code (Square Tape) — verify `THIRD_PARTY_NOTICES.md` reflects + your updates. + +False positives +─────────────── +a. AGENTS.md — project documentation, not vendored code. +``` diff --git a/.claude/skills/check-code-attribution/find-attribution-candidates.sh b/.claude/skills/check-code-attribution/find-attribution-candidates.sh new file mode 100755 index 00000000000..4dad6e6ddd5 --- /dev/null +++ b/.claude/skills/check-code-attribution/find-attribution-candidates.sh @@ -0,0 +1,345 @@ +#!/usr/bin/env bash +# shellcheck shell=bash +# +# Script for finding files in the current branch diff that may have added, modified, or removed an +# attribution for vendored code (those files are referred to as "attribution candidates"). This +# script handles cheap, deterministic identification and filtering; all interpretation (cross- +# referencing with THIRD_PARTY_NOTICES.md, license classification, etc.) is left to the LLM. +# +# Every run prints global metadata first (two lines), then zero or more candidate blocks: +# +# notices_file_exists: true|false +# notices_file_changed: true|false +# --- +# file: +# status: A|M|D|R (A = "added", M = "modified", D = "deleted", R = "renamed") +# reasons: +# --- +# +# Exit code 0 = no file candidates and THIRD_PARTY_NOTICES.md unchanged vs merge-base. +# Exit code 10 = one or more file candidates and/or THIRD_PARTY_NOTICES.md changed (including +# NOTICES-only edits with zero candidate blocks when diff hunks do not match attribution patterns). + +set -euo pipefail + +MERGE_BASE=$(git merge-base HEAD origin/main 2>/dev/null || git merge-base HEAD main 2>/dev/null) || { + echo "Error: could not determine merge-base. Neither 'origin/main' nor 'main' is reachable from HEAD." >&2 + echo "If this is a shallow clone, try: git fetch --unshallow origin main" >&2 + exit 2 +} +NOTICES_FILE="THIRD_PARTY_NOTICES.md" +notices_file_exists="true" +[[ ! -f "$NOTICES_FILE" ]] && notices_file_exists="false" +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +EXCLUSIONS_FILE="$SCRIPT_DIR/generated-code-exclusions.txt" + +# --- Filter terms --- +# Maintainer-friendly lists of terms used to identify vendored/attributed code. +# Update these when new attribution patterns or license types are encountered. + +# Strong indicators that code was adapted/copied from an external source +VENDORING_MARKERS='adapted from|backported from|copied from|derived from|ported from|translated from|vendored' + +# Recognized open-source license names (regex alternation) +LICENSE_NAMES='Apache 2\.0|Apache License|BSD [0-9]|BSD License|CC-BY|Creative Commons|Eclipse Public License|EPL|GNU General Public|GPL|ISC License|LGPL|MIT License|Mozilla Public|Public Domain|SPDX-License-Identifier|Unlicense' + +# Combined pattern for diff-hunk scanning of modified files. Intentionally broader +# than VENDORING_MARKERS: it includes generic terms ("copyright", "licensed") so the +# diff-hunk scan catches any attribution-related change. False positives from first- +# party headers are filtered out downstream by has_third_party_attribution() checks +# against the full file content (for additions) or merge-base content (for removals). +ATTRIBUTION_PATTERN="$VENDORING_MARKERS|copyright|licensed|$LICENSE_NAMES" + +# Sentry entity names — copyright lines mentioning these are treated as first-party +SENTRY_ENTITIES='functional software|getsentry|sentry software' + +# Build sed expression from SENTRY_ENTITIES (keeps entity list and strip patterns in sync) +_sentry_sed="" +IFS='|' read -ra _sentry_parts <<< "$SENTRY_ENTITIES" +for _part in "${_sentry_parts[@]}"; do + _sentry_sed+="s/${_part}//g; " +done +SENTRY_STRIP_SED="${_sentry_sed}s/sentry//g; s/copyright//g; s/(c)//g" +unset _sentry_sed _sentry_parts _part + +# Path segments that suggest vendored/external code +VENDOR_PATH_MARKERS='external|shaded|third-party|third_party|thirdparty|vendor|vendored' + +is_binary_file() { + # -I treats binary files as non-matching → exit 1; empty pattern matches any text file → exit 0 + ! grep -Iq '' "$1" 2>/dev/null +} + +# Infrastructure/config directories that never contain vendored source code. +# For generated-file filename patterns, see generated-code-exclusions.txt. +is_excluded_path() { + [[ "$1" =~ ^\.claude/ || "$1" =~ ^\.github/ || "$1" =~ ^\.gradle/ || "$1" =~ ^\.idea/ || "$1" =~ ^\.mvn/ || "$1" =~ ^buildSrc/ || "$1" =~ ^build-logic/ || "$1" =~ ^gradle/ ]] +} + +# Load exclusion patterns once into a temp file. Each pattern is validated before inclusion: +# invalid regexes are skipped with a warning, and patterns over 200 chars are rejected to +# limit ReDoS surface (the file is repo-controlled, but validation catches accidental breakage). +WORK_DIR=$(mktemp -d) +trap 'rm -rf "$WORK_DIR"' EXIT +EXCLUSION_PATTERNS_FILE="$WORK_DIR/exclusions" +if [[ -f "$EXCLUSIONS_FILE" ]]; then + while IFS= read -r pattern; do + [[ -z "$pattern" || "$pattern" == \#* ]] && continue + if [[ ${#pattern} -gt 200 ]]; then + echo "Warning: skipping exclusion pattern exceeding 200 chars: ${pattern:0:40}..." >&2 + continue + fi + rc=0; printf '' | grep -qE -- "$pattern" 2>/dev/null || rc=$? + if [[ $rc -eq 2 ]]; then + echo "Warning: skipping invalid regex in exclusions: $pattern" >&2 + continue + fi + printf '%s\n' "$pattern" + done < "$EXCLUSIONS_FILE" > "$EXCLUSION_PATTERNS_FILE" +fi + +is_generated_file() { + [[ -s "$EXCLUSION_PATTERNS_FILE" ]] && grep -qE -f "$EXCLUSION_PATTERNS_FILE" <<< "$1" +} + +has_vendor_path() { + local path_lower + path_lower=$(echo "$1" | tr '[:upper:]' '[:lower:]') + [[ "$path_lower" =~ (^|/)($VENDOR_PATH_MARKERS)(/|$) ]] +} + +has_third_party_attribution() { + local filepath="$1" + + # Vendoring markers are strong standalone indicators + grep -qiE "$VENDORING_MARKERS" "$filepath" && return 0 + + local copyright_lines + copyright_lines=$(grep -iE 'copyright' "$filepath" 2>/dev/null) || true + [[ -z "$copyright_lines" ]] && return 1 + + # Fast path: any copyright line without a Sentry entity is definitively third-party + echo "$copyright_lines" | grep -qivE "$SENTRY_ENTITIES" && return 0 + + # Slow path: all copyright lines mention a Sentry entity. Check for dual-copyright + # lines (e.g., "Copyright Functional Software and Example Corp") by stripping Sentry + # names and common boilerplate, then looking for remaining substantive words. + echo "$copyright_lines" | \ + tr '[:upper:]' '[:lower:]' | \ + sed "$SENTRY_STRIP_SED" | \ + sed 's/[0-9]//g; s/[^a-z]/ /g' | \ + tr -s ' ' '\n' | \ + grep -vxE '(and|the|inc|llc|ltd|or|of|by|all|rights|reserved)' | \ + grep -qE '[a-z]{3,}' && return 0 + + # License keywords alone (without a non-Sentry copyright or vendoring marker) do NOT + # indicate vendored code — many first-party files carry the project's own license header. + + return 1 +} + +# Check if diff hunks contain added attribution-related lines +has_added_attribution_lines() { + local diff_output="$1" + grep -E '^\+' <<< "$diff_output" | grep -vE '^\+\+\+ (b/|/dev/null)' | grep -qiE "$ATTRIBUTION_PATTERN" +} + +# Check if diff hunks contain removed attribution-related lines +has_removed_attribution_lines() { + local diff_output="$1" + grep -E '^-' <<< "$diff_output" | grep -vE '^--- (a/|/dev/null)' | grep -qiE "$ATTRIBUTION_PATTERN" +} + +# Collect changed files from all sources (committed, staged, unstaged, untracked), +# deduplicated by current filepath (last occurrence wins). Sources are listed oldest +# to newest, so the most recent state takes precedence — e.g. a file committed as +# "M" (modified) then staged for deletion resolves to "D". +collect_changed_files() { + { + git diff "$MERGE_BASE"..HEAD --name-status --find-renames 2>/dev/null || true + git diff --cached --name-status --find-renames 2>/dev/null || true + git diff --name-status --find-renames 2>/dev/null || true + git ls-files --others --exclude-standard 2>/dev/null | while IFS= read -r path; do + [[ -n "$path" ]] && printf 'A\t%s\n' "$path" + done + } | awk -F'\t' '{ + if ($1 ~ /^R/ && NF >= 3) key = $3 + else key = $NF + data[key] = $0 + if (!seen[key]++) order[++n] = key + } + END { + for (i = 1; i <= n; i++) print data[order[i]] + }' +} + +# Diff from merge-base to the current worktree state (committed + staged + unstaged) +# in a single pass. No duplicate hunks because git diffs the merge-base blob +# directly against the current worktree file. +combined_diff() { + local filepath="$1" + git diff "$MERGE_BASE" -- "$filepath" 2>/dev/null || echo "Warning: git diff failed for $filepath" >&2 +} + +# Check if THIRD_PARTY_NOTICES.md was modified in this branch (committed, staged, or unstaged) +notices_file_changed="false" +if git diff "$MERGE_BASE"..HEAD --name-only -- "$NOTICES_FILE" 2>/dev/null | grep -q . \ + || git diff --cached --name-only -- "$NOTICES_FILE" 2>/dev/null | grep -q . \ + || git diff --name-only -- "$NOTICES_FILE" 2>/dev/null | grep -q .; then + notices_file_changed="true" +fi + +# Global metadata is always printed first so consumers can run NOTICES review even when there are +# zero file candidates (e.g. Scope-only edits to THIRD_PARTY_NOTICES.md). +echo "notices_file_exists: $notices_file_exists" +echo "notices_file_changed: $notices_file_changed" + +found_any=false + +# Process each changed file +while IFS=$'\t' read -r status filepath old_filepath; do + [[ -z "$status" ]] && continue + + status_char="${status:0:1}" + + # For renames, `read` assigns: status=R###, filepath=old_path, old_filepath=new_path. + # Swap so filepath holds the current (new) path and old_filepath holds the original. + if [[ "$status_char" == "R" ]]; then + current_path="$old_filepath" + old_filepath="$filepath" + filepath="$current_path" + fi + + # NOTICES changes are tracked via the notices_file_changed metadata field, not as a candidate. + [[ "$filepath" == "$NOTICES_FILE" ]] && continue + is_excluded_path "$filepath" && continue + is_generated_file "$filepath" && continue + + # Skip binary files + if [[ "$status_char" != "D" ]] && is_binary_file "$filepath"; then + continue + fi + is_candidate=false + reasons=() + + # --- Determine content and candidate status --- + + if [[ "$status_char" == "A" ]]; then + # Cap at 100KB to avoid overhead on large generated files + if [[ $(wc -c < "$filepath" 2>/dev/null) -gt 102400 ]]; then + echo "Warning: $filepath exceeds 100KB — only the first 100KB will be scanned for attribution markers." >&2 + fi + head -c 102400 "$filepath" > "$WORK_DIR/content" 2>/dev/null || continue + + if has_vendor_path "$filepath"; then + is_candidate=true + reasons+=("path suggests vendored code") + fi + + if has_third_party_attribution "$WORK_DIR/content"; then + is_candidate=true + reasons+=("attribution markers in file") + fi + + elif [[ "$status_char" == "D" ]]; then + git show "$MERGE_BASE":"$filepath" > "$WORK_DIR/content" 2>/dev/null || continue + + if has_vendor_path "$filepath"; then + is_candidate=true + reasons+=("deleted file in vendor path") + fi + + if has_third_party_attribution "$WORK_DIR/content"; then + is_candidate=true + reasons+=("deleted file with attribution markers") + fi + + elif [[ "$status_char" == "R" ]]; then + # Renamed file: old_filepath has the original path + if [[ $(wc -c < "$filepath" 2>/dev/null) -gt 102400 ]]; then + echo "Warning: $filepath exceeds 100KB — only the first 100KB will be scanned for attribution markers." >&2 + fi + head -c 102400 "$filepath" > "$WORK_DIR/content" 2>/dev/null || continue + + if has_vendor_path "$filepath" || has_vendor_path "${old_filepath:-}"; then + is_candidate=true + reasons+=("renamed file in vendor path") + fi + + if has_third_party_attribution "$WORK_DIR/content"; then + is_candidate=true + reasons+=("renamed file with attribution markers") + elif [[ "$is_candidate" == "false" ]]; then + git show "$MERGE_BASE":"${old_filepath:-$filepath}" > "$WORK_DIR/old_content" 2>/dev/null || true + if [[ -s "$WORK_DIR/old_content" ]] && has_third_party_attribution "$WORK_DIR/old_content"; then + is_candidate=true + reasons+=("attribution markers stripped during rename") + fi + fi + + elif [[ "$status_char" == "M" ]]; then + if [[ $(wc -c < "$filepath" 2>/dev/null) -gt 102400 ]]; then + echo "Warning: $filepath exceeds 100KB — only the first 100KB will be scanned for attribution markers." >&2 + fi + head -c 102400 "$filepath" > "$WORK_DIR/content" 2>/dev/null || continue + diff_output=$(combined_diff "$filepath") + + has_added=false + has_removed=false + has_added_attribution_lines "$diff_output" && has_added=true + has_removed_attribution_lines "$diff_output" && has_removed=true + + if [[ "$has_added" == "false" && "$has_removed" == "false" ]]; then + continue + fi + + if [[ "$has_added" == "true" && "$has_removed" == "true" ]]; then + git show "$MERGE_BASE":"$filepath" > "$WORK_DIR/old_content" 2>/dev/null || continue + if ! has_third_party_attribution "$WORK_DIR/content" && ! has_third_party_attribution "$WORK_DIR/old_content" && ! has_vendor_path "$filepath"; then + continue + fi + reasons+=("attribution markers modified") + elif [[ "$has_added" == "true" ]]; then + # The diff-hunk scan uses broad patterns (e.g., "copyright", "licensed") that + # match first-party headers too. When markers were only added (not removed), + # filter out files whose full content has no third-party attribution — those + # are Sentry's own license headers being added. + if ! has_third_party_attribution "$WORK_DIR/content" && ! has_vendor_path "$filepath"; then + continue + fi + reasons+=("attribution markers added") + else + # Mirror the added-only guard: check the merge-base content for third-party + # attribution so we don't flag removal of Sentry's own copyright headers. + git show "$MERGE_BASE":"$filepath" > "$WORK_DIR/old_content" 2>/dev/null || continue + if ! has_third_party_attribution "$WORK_DIR/old_content" && ! has_vendor_path "$filepath"; then + continue + fi + reasons+=("attribution markers removed") + fi + is_candidate=true + + has_vendor_path "$filepath" && reasons+=("file in vendor path") + fi + + if [[ "$is_candidate" == "false" ]]; then + continue + fi + + # Format reasons as comma-separated string + reasons_str=$(printf '%s, ' "${reasons[@]+${reasons[@]}}" | sed 's/, $//') + + # Output structured block + echo "---" + echo "file: $filepath" + echo "status: $status_char" + echo "reasons: $reasons_str" + echo "---" + + found_any=true + +done < <(collect_changed_files) + +if [[ "$found_any" == "true" ]] || [[ "$notices_file_changed" == "true" ]]; then + exit 10 +fi diff --git a/.claude/skills/check-code-attribution/generated-code-exclusions.txt b/.claude/skills/check-code-attribution/generated-code-exclusions.txt new file mode 100644 index 00000000000..235c1cf4d63 --- /dev/null +++ b/.claude/skills/check-code-attribution/generated-code-exclusions.txt @@ -0,0 +1,38 @@ +# Filename patterns for auto-generated files that don't need to satisfy the +# attribution requirements for vendored code found in CODE_ATTRIBUTION_CRITERIA.md. + +# Android (AIDL, data binding, R class, BuildConfig — only under build/generated dirs) +\.aidl$ +/databinding/.*Binding\.java$ +/generated.*/R\.java$ +/generated.*/BuildConfig\.java$ + +# ANTLR (generated output files — match only under generated dirs or with ANTLR naming) +/generated.*/.*Lexer\.java$ +/generated.*/.*Parser\.java$ +/generated.*/.*Listener\.java$ +/generated.*/.*Visitor\.java$ +\.interp$ +\.tokens$ + +# API dump files (binary compatibility tracking) +\.api$ + +# Generated source directories +/generated/ + +# Gradle build scripts +build\.gradle(\.kts)?$ +settings\.gradle(\.kts)?$ + +# gRPC (generated stubs end with Grpc.java and live under generated-source dirs) +/grpc/.*Grpc\.java$ + +# Kotlin code-generator convention +\.g\.kt$ + +# KSP (Kotlin Symbol Processing) generated output +/ksp/ + +# Protocol Buffers +\.pb\.java$ diff --git a/.claude/skills/check-code-attribution/test/test-find-attribution-candidates.sh b/.claude/skills/check-code-attribution/test/test-find-attribution-candidates.sh new file mode 100644 index 00000000000..0e170d3fe30 --- /dev/null +++ b/.claude/skills/check-code-attribution/test/test-find-attribution-candidates.sh @@ -0,0 +1,1064 @@ +#!/usr/bin/env bash +# shellcheck shell=bash +# +# Tests for find-attribution-candidates.sh +# +# Each test creates a temporary git repo, sets up a specific scenario, +# runs the script, and asserts on its output and exit code. +# +# Usage: bash test/test-find-attribution-candidates.sh + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +SCRIPT="$SCRIPT_DIR/../find-attribution-candidates.sh" + +TESTS_RUN=0 +TESTS_PASSED=0 +TESTS_FAILED=0 + +RED='\033[0;31m' +GREEN='\033[0;32m' +NC='\033[0m' + +# --- Helpers --- + +setup_repo() { + local tmpdir + tmpdir=$(mktemp -d) + git -C "$tmpdir" init -b main --quiet + git -C "$tmpdir" config user.email "test@test.com" + git -C "$tmpdir" config user.name "Test" + + cat > "$tmpdir/THIRD_PARTY_NOTICES.md" << 'NOTICES' +# Third-Party Notices + +## Example Library (Apache 2.0) + +- Source: https://github.com/example/library +- License: Apache License 2.0 +- Copyright: Copyright 2024 Example Inc. +- Scope: `src/main/java/com/example/Foo.java` +NOTICES + git -C "$tmpdir" add THIRD_PARTY_NOTICES.md + git -C "$tmpdir" commit -m "Initial commit" --quiet + + echo "$tmpdir" +} + +setup_branch() { + git checkout -b test-branch --quiet +} + +cleanup_repo() { + rm -rf "$1" +} + +run_script() { + bash "$SCRIPT" 2>&1 +} + +get_field() { + local output="$1" field="$2" + echo "$output" | grep "^${field}: " | head -1 | sed "s/^${field}: //" +} + +assert_eq() { + local actual="$1" expected="$2" msg="$3" + if [[ "$actual" == "$expected" ]]; then + return 0 + fi + echo -e " ${RED}FAIL${NC}: $msg" + echo " expected: '$expected'" + echo " actual: '$actual'" + return 1 +} + +assert_contains() { + local haystack="$1" needle="$2" msg="$3" + if [[ "$haystack" == *"$needle"* ]]; then + return 0 + fi + echo -e " ${RED}FAIL${NC}: $msg" + echo " expected to contain: '$needle'" + echo " actual: '$haystack'" + return 1 +} + +# Script contract: stdout always begins with these two lines (in order). +assert_global_metadata_prefix() { + local output="$1" exists="$2" changed="$3" msg="$4" + assert_eq "$(echo "$output" | head -n 1)" "notices_file_exists: $exists" "${msg} — notices_file_exists line" || return 1 + assert_eq "$(echo "$output" | sed -n '2p')" "notices_file_changed: $changed" "${msg} — notices_file_changed line" || return 1 +} + +assert_line_count() { + local output="$1" expected="$2" msg="$3" + local count + count=$(printf '%s\n' "$output" | wc -l | tr -d '[:space:]') + assert_eq "$count" "$expected" "$msg" || return 1 +} + +run_test() { + local test_name="$1" test_fn="$2" + TESTS_RUN=$((TESTS_RUN + 1)) + + local tmpdir original_dir + tmpdir=$(setup_repo) + original_dir=$(pwd) + cd "$tmpdir" + + local failed=false + echo -n "$test_name ... " + + if ! $test_fn; then + failed=true + fi + + cd "$original_dir" + cleanup_repo "$tmpdir" + + if [[ "$failed" == "true" ]]; then + TESTS_FAILED=$((TESTS_FAILED + 1)) + echo -e "${RED}FAILED${NC}" + else + TESTS_PASSED=$((TESTS_PASSED + 1)) + echo -e "${GREEN}ok${NC}" + fi +} + +# --- Test cases --- + +test_clean_branch_no_candidates() { + setup_branch + mkdir -p src + echo "package com.example; public class Clean {}" > src/Clean.java + git add src/Clean.java + git commit -m "Add clean file" --quiet + + local output exit_code=0 + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "0" "should exit 0 when no candidates and NOTICES unchanged" || return 1 + assert_global_metadata_prefix "$output" true false "clean branch" || return 1 + assert_line_count "$output" 2 "exit 0 with no work should print exactly metadata lines" || return 1 +} + +test_new_file_with_attribution() { + setup_branch + mkdir -p src + cat > src/Adapted.java << 'JAVA' +// Adapted from SomeLibrary. +// Copyright 2024 Some Author. +// Licensed under the Apache License 2.0. +// https://github.com/some/library +package com.example; +public class Adapted {} +JAVA + git add src/Adapted.java + git commit -m "Add adapted file" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when candidates found" || ok=false + assert_global_metadata_prefix "$output" true false "candidate with NOTICES unchanged" || ok=false + assert_eq "$(echo "$output" | sed -n '3p')" "---" "metadata lines precede first candidate block" || ok=false + assert_eq "$(get_field "$output" "file")" "src/Adapted.java" "file path" || ok=false + assert_eq "$(get_field "$output" "status")" "A" "status" || ok=false + assert_contains "$(get_field "$output" "reasons")" "attribution markers in file" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_new_file_in_vendor_path() { + setup_branch + mkdir -p vendor/lib + echo "public class Foo {}" > vendor/lib/Foo.java + git add vendor/lib/Foo.java + git commit -m "Add vendored file" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when vendor path candidate found" || ok=false + assert_eq "$(get_field "$output" "file")" "vendor/lib/Foo.java" "file path" || ok=false + assert_contains "$(get_field "$output" "reasons")" "path suggests vendored code" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_new_file_under_io_sentry_vendor_path() { + setup_branch + mkdir -p io/sentry/vendor + echo "public class VendorStub {}" > io/sentry/vendor/VendorStub.java + git add io/sentry/vendor/VendorStub.java + git commit -m "Add file under io/sentry/vendor" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when io/sentry/vendor path candidate found" || ok=false + assert_eq "$(get_field "$output" "file")" "io/sentry/vendor/VendorStub.java" "file path" || ok=false + assert_contains "$(get_field "$output" "reasons")" "path suggests vendored code" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_deleted_file_with_attribution() { + mkdir -p src + cat > src/Licensed.java << 'JAVA' +// Adapted from OldLib. +// Copyright 2023 Old Author. +// Licensed under the MIT License. +package com.example; +public class Licensed {} +JAVA + git add src/Licensed.java + git commit -m "Add licensed file" --quiet + + setup_branch + git rm src/Licensed.java --quiet + git commit -m "Remove licensed file" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when candidates found" || ok=false + assert_eq "$(get_field "$output" "file")" "src/Licensed.java" "file path" || ok=false + assert_eq "$(get_field "$output" "status")" "D" "status" || ok=false + assert_contains "$(get_field "$output" "reasons")" "deleted file with attribution markers" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_modified_file_attribution_added() { + mkdir -p src + echo "package com.example; public class Mod {}" > src/Mod.java + git add src/Mod.java + git commit -m "Add file" --quiet + + setup_branch + cat > src/Mod.java << 'JAVA' +// Adapted from NewLib. +// Copyright 2024 New Author. +// Licensed under the Apache License 2.0. +package com.example; +public class Mod {} +JAVA + git add src/Mod.java + git commit -m "Add attribution" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when candidates found" || ok=false + assert_eq "$(get_field "$output" "status")" "M" "status" || ok=false + assert_contains "$(get_field "$output" "reasons")" "attribution markers added" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_modified_file_attribution_removed() { + mkdir -p src + cat > src/Strip.java << 'JAVA' +// Adapted from StripLib. +// Copyright 2024 Strip Author. +// Licensed under the MIT License. +package com.example; +public class Strip {} +JAVA + git add src/Strip.java + git commit -m "Add attributed file" --quiet + + setup_branch + cat > src/Strip.java << 'JAVA' +package com.example; +public class Strip {} +JAVA + git add src/Strip.java + git commit -m "Remove attribution" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when candidates found" || ok=false + assert_eq "$(get_field "$output" "status")" "M" "status" || ok=false + assert_contains "$(get_field "$output" "reasons")" "attribution markers removed" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_renamed_file_has_correct_path() { + mkdir -p src/old + cat > src/old/Lib.java << 'JAVA' +// Adapted from RenameLib. +// Copyright 2024 Rename Author. +// Licensed under the Apache License 2.0. +package com.example; +public class Lib {} +JAVA + git add src/old/Lib.java + git commit -m "Add file" --quiet + + setup_branch + mkdir -p src/new + git mv src/old/Lib.java src/new/Lib.java + git commit -m "Rename file" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when candidates found" || ok=false + assert_eq "$(get_field "$output" "file")" "src/new/Lib.java" "file should be new path" || ok=false + assert_eq "$(get_field "$output" "status")" "R" "status" || ok=false + assert_contains "$(get_field "$output" "reasons")" "renamed file with attribution markers" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_staged_new_file_detected() { + setup_branch + echo "placeholder" > placeholder.txt + git add placeholder.txt + git commit -m "Placeholder" --quiet + + mkdir -p src + cat > src/Staged.java << 'JAVA' +// Adapted from StagedLib. +// Copyright 2024 Staged Author. +// Licensed under the MIT License. +package com.example; +public class Staged {} +JAVA + git add src/Staged.java + # NOT committed + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when staged candidate found" || ok=false + assert_contains "$output" "src/Staged.java" "should detect staged file" || ok=false + assert_contains "$(get_field "$output" "reasons")" "attribution markers in file" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_staged_modification_detected() { + mkdir -p src + echo "package com.example; public class StagedMod {}" > src/StagedMod.java + git add src/StagedMod.java + git commit -m "Add file" --quiet + + setup_branch + echo "x" > placeholder.txt + git add placeholder.txt + git commit -m "Placeholder" --quiet + + cat > src/StagedMod.java << 'JAVA' +// Adapted from StagedModLib. +// Copyright 2024 StagedMod Author. +// Licensed under the Apache License 2.0. +package com.example; +public class StagedMod {} +JAVA + git add src/StagedMod.java + # NOT committed — staged only + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when staged modification detected" || ok=false + assert_contains "$output" "src/StagedMod.java" "should detect staged modification" || ok=false + assert_contains "$(get_field "$output" "reasons")" "attribution markers added" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_untracked_file_detected() { + setup_branch + echo "placeholder" > placeholder.txt + git add placeholder.txt + git commit -m "Placeholder" --quiet + + mkdir -p src + cat > src/Untracked.java << 'JAVA' +// Adapted from UntrackedLib. +// Copyright 2024 Untracked Author. +// Licensed under the Apache License 2.0. +package com.example; +public class Untracked {} +JAVA + # NOT staged, NOT committed + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when untracked candidate found" || ok=false + assert_contains "$output" "src/Untracked.java" "should detect untracked file" || ok=false + [[ "$ok" == "true" ]] +} + +test_notices_file_changed_true() { + setup_branch + mkdir -p src + cat > src/Noticed.java << 'JAVA' +// Adapted from NoticedLib. +// Copyright 2024 Noticed Author. +// Licensed under the Apache License 2.0. +// https://github.com/example/library +package com.example; +public class Noticed {} +JAVA + echo "## New Entry" >> THIRD_PARTY_NOTICES.md + git add src/Noticed.java THIRD_PARTY_NOTICES.md + git commit -m "Add attributed file and update notices" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when candidates or NOTICES changed" || ok=false + assert_global_metadata_prefix "$output" true true "committed NOTICES + candidate" || ok=false + assert_eq "$(echo "$output" | sed -n '3p')" "---" "metadata precedes candidate block" || ok=false + [[ "$ok" == "true" ]] +} + +test_notices_only_change_triggers_without_file_candidates() { + setup_branch + printf '\n### Doc-only tweak\nPaths refreshed.\n' >> THIRD_PARTY_NOTICES.md + git add THIRD_PARTY_NOTICES.md + git commit -m "Doc-only NOTICES update" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "THIRD_PARTY_NOTICES.md change must exit 10" || ok=false + assert_global_metadata_prefix "$output" true true "NOTICES-only committed doc tweak" || ok=false + local count + count=$(echo "$output" | grep -c "^file: " || true) + assert_eq "$count" "0" "no file candidate blocks when diff avoids attribution keywords" || ok=false + assert_line_count "$output" 2 "NOTICES-only signal should be exactly two metadata lines" || ok=false + [[ "$ok" == "true" ]] +} + +test_notices_only_unstaged_triggers_without_file_candidates() { + setup_branch + echo "placeholder" > placeholder.txt + git add placeholder.txt + git commit -m "Placeholder" --quiet + + printf '\n### Unstaged doc tweak\nNo license keywords here.\n' >> THIRD_PARTY_NOTICES.md + # NOT staged — worktree-only change to NOTICES + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "unstaged NOTICES change must exit 10" || ok=false + assert_global_metadata_prefix "$output" true true "unstaged NOTICES only" || ok=false + local count + count=$(echo "$output" | grep -c "^file: " || true) + assert_eq "$count" "0" "no file blocks for NOTICES-only unstaged edit" || ok=false + assert_line_count "$output" 2 "stdout should be metadata only" || ok=false + [[ "$ok" == "true" ]] +} + +# NOTICES exists at merge-base (main) but is removed on the feature branch only. +test_notices_deleted_on_branch() { + setup_branch + git rm THIRD_PARTY_NOTICES.md --quiet + git commit -m "Remove THIRD_PARTY_NOTICES on branch" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "NOTICES deletion on branch must exit 10" || ok=false + assert_global_metadata_prefix "$output" false true "NOTICES deleted vs merge-base" || ok=false + local count + count=$(echo "$output" | grep -c "^file: " || true) + assert_eq "$count" "0" "no file candidates when only NOTICES is deleted" || ok=false + assert_line_count "$output" 2 "metadata only when NOTICES-only deletion" || ok=false + [[ "$ok" == "true" ]] +} + +test_excluded_paths_skipped() { + setup_branch + mkdir -p .github/workflows + cat > .github/workflows/Licensed.java << 'JAVA' +// Adapted from ExcludedLib. +// Copyright 2024 Excluded Author. +// Licensed under the MIT License. +package com.example; +public class Licensed {} +JAVA + git add .github/workflows/Licensed.java + git commit -m "Add file in excluded path" --quiet + + local output exit_code=0 + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "0" "excluded paths should produce no candidates" || return 1 + assert_global_metadata_prefix "$output" true false "excluded path" || return 1 + assert_line_count "$output" 2 "exit 0 should print metadata only" || return 1 +} + +test_generated_files_skipped() { + setup_branch + mkdir -p src/generated/com/example + cat > src/generated/com/example/R.java << 'JAVA' +// Adapted from GeneratedLib. +// Copyright 2024 Generated Author. +// Licensed under the Apache License 2.0. +package com.example; +public class R {} +JAVA + git add src/generated/com/example/R.java + git commit -m "Add generated file" --quiet + + local output exit_code=0 + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "0" "generated files should produce no candidates" || return 1 + assert_global_metadata_prefix "$output" true false "generated file skipped" || return 1 + assert_line_count "$output" 2 "exit 0 should print metadata only" || return 1 +} + +test_sentry_copyright_not_flagged() { + setup_branch + mkdir -p src + cat > src/SentryOwned.java << 'JAVA' +// Copyright 2024 Functional Software, Inc. +package com.example; +public class SentryOwned {} +JAVA + git add src/SentryOwned.java + git commit -m "Add Sentry-owned file" --quiet + + local output exit_code=0 + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "0" "Sentry copyright should not trigger attribution" || return 1 + assert_global_metadata_prefix "$output" true false "Sentry-only copyright" || return 1 + assert_line_count "$output" 2 "exit 0 should print metadata only" || return 1 +} + + +test_notices_staged_change_detected() { + setup_branch + mkdir -p src + cat > src/StagedNotice.java << 'JAVA' +// Adapted from StagedNoticeLib. +// Copyright 2024 StagedNotice Author. +// Licensed under the Apache License 2.0. +package com.example; +public class StagedNotice {} +JAVA + git add src/StagedNotice.java + git commit -m "Add file" --quiet + + # Stage a change to THIRD_PARTY_NOTICES.md but don't commit + echo "## Staged Entry" >> THIRD_PARTY_NOTICES.md + git add THIRD_PARTY_NOTICES.md + # NOT committed + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "staged NOTICES change must exit 10" || ok=false + assert_global_metadata_prefix "$output" true true "staged NOTICES with committed candidate" || ok=false + [[ "$ok" == "true" ]] +} + +test_modified_vendor_path_file_attribution_changed() { + mkdir -p vendor/lib + cat > vendor/lib/Vendored.java << 'JAVA' +// Adapted from VendoredLib. +// Copyright 2024 Vendored Author. +// Licensed under the Apache License 2.0. +// https://github.com/example/library +package com.example; +public class Vendored { + void oldMethod() {} +} +JAVA + git add vendor/lib/Vendored.java + git commit -m "Add vendored file" --quiet + + setup_branch + cat > vendor/lib/Vendored.java << 'JAVA' +// Adapted from VendoredLib. +// Copyright 2025 Vendored Author. +// Licensed under the Apache License 2.0. +// https://github.com/example/library +package com.example; +public class Vendored { + void newMethod() {} +} +JAVA + git add vendor/lib/Vendored.java + git commit -m "Update vendored file" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when vendor-path attribution change detected" || ok=false + assert_eq "$(get_field "$output" "status")" "M" "status" || ok=false + assert_contains "$(get_field "$output" "reasons")" "attribution markers modified" "reasons should include attribution change" || ok=false + assert_contains "$(get_field "$output" "reasons")" "file in vendor path" "reasons should include vendor path" || ok=false + [[ "$ok" == "true" ]] +} + +test_binary_file_skipped() { + setup_branch + mkdir -p src + # Create a file with a null byte so git treats it as binary + printf '// Adapted from BinaryLib.\n// Copyright 2024 Binary Author.\n\x00binary content' > src/Binary.dat + git add src/Binary.dat + git commit -m "Add binary file" --quiet + + local output exit_code=0 + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "0" "binary files should be skipped" || return 1 + assert_global_metadata_prefix "$output" true false "binary skipped" || return 1 + assert_line_count "$output" 2 "exit 0 should print metadata only" || return 1 +} + +# Many first-party Sentry files carry the project's own Apache 2.0 license header. +# A license header alone (without a vendoring marker like "Adapted from" or a +# non-Sentry copyright) does not indicate vendored code and should not be flagged. +test_license_only_header_not_flagged() { + setup_branch + mkdir -p src + cat > src/LicenseOnly.java << 'JAVA' +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + */ +package com.example; +public class LicenseOnly {} +JAVA + git add src/LicenseOnly.java + git commit -m "Add file with license-only header" --quiet + + local output exit_code=0 + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "0" "license-only header without third-party copyright should not trigger attribution" || return 1 + assert_global_metadata_prefix "$output" true false "license-only header" || return 1 + assert_line_count "$output" 2 "exit 0 should print metadata only" || return 1 +} + +test_modified_vendored_file_no_attribution_change() { + mkdir -p src + cat > src/Vendored.java << 'JAVA' +// Adapted from VendoredLib. +// Copyright 2024 Vendored Author. +// Licensed under the Apache License 2.0. +// https://github.com/example/library +package com.example; +public class Vendored { + void oldMethod() {} +} +JAVA + git add src/Vendored.java + git commit -m "Add vendored file" --quiet + + setup_branch + cat > src/Vendored.java << 'JAVA' +// Adapted from VendoredLib. +// Copyright 2024 Vendored Author. +// Licensed under the Apache License 2.0. +// https://github.com/example/library +package com.example; +public class Vendored { + void newMethod() {} +} +JAVA + git add src/Vendored.java + git commit -m "Fix bug in vendored file" --quiet + + local output exit_code=0 + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "0" "should not flag vendored file when only non-attribution lines changed" || return 1 + assert_global_metadata_prefix "$output" true false "vendored file non-attribution edit" || return 1 + assert_line_count "$output" 2 "exit 0 should print metadata only" || return 1 +} + +# Adding Sentry's own license header to a first-party file should not trigger. +test_modified_first_party_license_header_not_flagged() { + mkdir -p src + echo "package com.example; public class FirstParty {}" > src/FirstParty.java + git add src/FirstParty.java + git commit -m "Add file" --quiet + + setup_branch + cat > src/FirstParty.java << 'JAVA' +/* + * Copyright 2025 Functional Software, Inc. + * Licensed under the Apache License, Version 2.0. + */ +package com.example; +public class FirstParty {} +JAVA + git add src/FirstParty.java + git commit -m "Add Sentry license header" --quiet + + local output exit_code=0 + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "0" "adding Sentry's own license header should not trigger attribution" || return 1 + assert_global_metadata_prefix "$output" true false "first-party license header" || return 1 + assert_line_count "$output" 2 "exit 0 should print metadata only" || return 1 +} + +test_unstaged_modification_detected() { + mkdir -p src + echo "package com.example; public class Unstaged {}" > src/Unstaged.java + git add src/Unstaged.java + git commit -m "Add file" --quiet + + setup_branch + echo "x" > placeholder.txt + git add placeholder.txt + git commit -m "Placeholder" --quiet + + cat > src/Unstaged.java << 'JAVA' +// Adapted from UnstagedLib. +// Copyright 2024 Unstaged Author. +// Licensed under the Apache License 2.0. +package com.example; +public class Unstaged {} +JAVA + # NOT staged, NOT committed — worktree only + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when unstaged modification detected" || ok=false + assert_contains "$output" "src/Unstaged.java" "should detect unstaged modification" || ok=false + assert_contains "$(get_field "$output" "reasons")" "attribution markers added" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_multiple_candidates() { + setup_branch + mkdir -p src + cat > src/First.java << 'JAVA' +// Adapted from FirstLib. +// Copyright 2024 First Author. +// Licensed under the MIT License. +package com.example; +public class First {} +JAVA + cat > src/Second.java << 'JAVA' +// Adapted from SecondLib. +// Copyright 2024 Second Author. +// Licensed under the Apache License 2.0. +package com.example; +public class Second {} +JAVA + git add src/First.java src/Second.java + git commit -m "Add two attributed files" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when candidates found" || ok=false + local count + count=$(echo "$output" | grep -c "^file: " || true) + assert_eq "$count" "2" "should find exactly 2 candidates" || ok=false + assert_contains "$output" "src/First.java" "should include First.java" || ok=false + assert_contains "$output" "src/Second.java" "should include Second.java" || ok=false + [[ "$ok" == "true" ]] +} + +test_missing_notices_file() { + git rm THIRD_PARTY_NOTICES.md --quiet + git commit -m "Remove notices file" --quiet + + setup_branch + mkdir -p src + cat > src/Vendored.java << 'JAVA' +// Adapted from SomeLib. +// Copyright 2024 Some Author. +// Licensed under the Apache License 2.0. +package com.example; +public class Vendored {} +JAVA + git add src/Vendored.java + git commit -m "Add vendored file" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should still find candidates" || ok=false + assert_global_metadata_prefix "$output" false false "missing NOTICES with vendored candidate" || ok=false + assert_eq "$(echo "$output" | sed -n '3p')" "---" "metadata precedes candidate block" || ok=false + [[ "$ok" == "true" ]] +} + +test_merge_base_failure() { + git checkout --orphan no-main --quiet 2>/dev/null + echo "orphan" > orphan.txt + git add orphan.txt + git commit -m "Orphan commit" --quiet + git branch -D main --quiet 2>/dev/null || true + + local exit_code=0 output + output=$(run_script) || exit_code=$? + + local ok=true + assert_eq "$exit_code" "2" "should exit 2 when merge-base fails" || ok=false + assert_contains "$output" "could not determine merge-base" "error message" || ok=false + [[ "$ok" == "true" ]] +} + +test_renamed_into_vendor_path() { + mkdir -p src + echo "public class Moved {}" > src/Moved.java + git add src/Moved.java + git commit -m "Add file" --quiet + + setup_branch + mkdir -p vendor/lib + git mv src/Moved.java vendor/lib/Moved.java + git commit -m "Move to vendor path" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when renamed into vendor path" || ok=false + assert_eq "$(get_field "$output" "file")" "vendor/lib/Moved.java" "file should be new vendor path" || ok=false + assert_eq "$(get_field "$output" "status")" "R" "status" || ok=false + assert_contains "$(get_field "$output" "reasons")" "renamed file in vendor path" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_renamed_out_of_vendor_path() { + mkdir -p vendor/lib + cat > vendor/lib/Leaving.java << 'JAVA' +// Adapted from LeavingLib. +// Copyright 2024 Leaving Author. +// Licensed under the MIT License. +package com.example; +public class Leaving {} +JAVA + git add vendor/lib/Leaving.java + git commit -m "Add vendored file" --quiet + + setup_branch + mkdir -p src + git mv vendor/lib/Leaving.java src/Leaving.java + git commit -m "Move out of vendor path" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when renamed out of vendor path" || ok=false + assert_eq "$(get_field "$output" "file")" "src/Leaving.java" "file should be new path" || ok=false + assert_eq "$(get_field "$output" "status")" "R" "status" || ok=false + assert_contains "$(get_field "$output" "reasons")" "renamed file in vendor path" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_renamed_with_attribution_stripped() { + mkdir -p src/old + # File must be large enough that removing the 3-line header stays above git's + # 50% rename-detection similarity threshold. + cat > src/old/Stripped.java << 'JAVA' +// Adapted from StrippedLib. +// Copyright 2024 Stripped Author. +// Licensed under the MIT License. +package com.example; +public class Stripped { + private int field1; + private int field2; + private int field3; + public void method1() { field1 = 1; } + public void method2() { field2 = 2; } + public void method3() { field3 = 3; } + public int getField1() { return field1; } + public int getField2() { return field2; } + public int getField3() { return field3; } +} +JAVA + git add src/old/Stripped.java + git commit -m "Add attributed file" --quiet + + setup_branch + mkdir -p src/new + git mv src/old/Stripped.java src/new/Stripped.java + cat > src/new/Stripped.java << 'JAVA' +package com.example; +public class Stripped { + private int field1; + private int field2; + private int field3; + public void method1() { field1 = 1; } + public void method2() { field2 = 2; } + public void method3() { field3 = 3; } + public int getField1() { return field1; } + public int getField2() { return field2; } + public int getField3() { return field3; } +} +JAVA + git add src/new/Stripped.java + git commit -m "Rename and strip attribution" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when attribution stripped during rename" || ok=false + assert_eq "$(get_field "$output" "file")" "src/new/Stripped.java" "file should be new path" || ok=false + assert_eq "$(get_field "$output" "status")" "R" "status" || ok=false + assert_contains "$(get_field "$output" "reasons")" "attribution markers stripped during rename" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_modified_sentry_copyright_year_bump_not_flagged() { + mkdir -p src + cat > src/SentryYear.java << 'JAVA' +// Copyright 2024 Functional Software, Inc. +// Licensed under the Apache License, Version 2.0. +package com.example; +public class SentryYear {} +JAVA + git add src/SentryYear.java + git commit -m "Add file with Sentry copyright" --quiet + + setup_branch + cat > src/SentryYear.java << 'JAVA' +// Copyright 2025 Functional Software, Inc. +// Licensed under the Apache License, Version 2.0. +package com.example; +public class SentryYear {} +JAVA + git add src/SentryYear.java + git commit -m "Bump copyright year" --quiet + + local output exit_code=0 + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "0" "Sentry copyright year bump should not trigger attribution" || return 1 + assert_global_metadata_prefix "$output" true false "Sentry copyright year bump" || return 1 + assert_line_count "$output" 2 "exit 0 should print metadata only" || return 1 +} + +test_dual_copyright_sentry_and_third_party() { + setup_branch + mkdir -p src + cat > src/DualCopyright.java << 'JAVA' +// Copyright 2024 Functional Software, Inc. and Example Corp +// Licensed under the Apache License 2.0. +package com.example; +public class DualCopyright {} +JAVA + git add src/DualCopyright.java + git commit -m "Add dual-copyright file" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 for dual-copyright file" || ok=false + assert_eq "$(get_field "$output" "file")" "src/DualCopyright.java" "file path" || ok=false + assert_contains "$(get_field "$output" "reasons")" "attribution markers in file" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_committed_modified_then_staged_delete() { + mkdir -p src + cat > src/Conflict.java << 'JAVA' +// Adapted from ConflictLib. +// Copyright 2024 Conflict Author. +// Licensed under the MIT License. +package com.example; +public class Conflict {} +JAVA + git add src/Conflict.java + git commit -m "Add attributed file" --quiet + + setup_branch + cat > src/Conflict.java << 'JAVA' +// Adapted from ConflictLib. +// Copyright 2025 Conflict Author. +// Licensed under the MIT License. +package com.example; +public class Conflict { void updated() {} } +JAVA + git add src/Conflict.java + git commit -m "Modify file" --quiet + + git rm src/Conflict.java --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when candidates found" || ok=false + assert_eq "$(get_field "$output" "file")" "src/Conflict.java" "file path" || ok=false + assert_eq "$(get_field "$output" "status")" "D" "staged delete should override committed modify" || ok=false + assert_contains "$(get_field "$output" "reasons")" "deleted file with attribution markers" "reasons" || ok=false + [[ "$ok" == "true" ]] +} + +test_notices_file_not_emitted_as_candidate() { + setup_branch + mkdir -p src + cat > src/Vendored.java << 'JAVA' +// Adapted from SomeLib. +// Copyright 2024 Some Author. +// Licensed under the Apache License 2.0. +package com.example; +public class Vendored {} +JAVA + echo "## New Entry" >> THIRD_PARTY_NOTICES.md + git add src/Vendored.java THIRD_PARTY_NOTICES.md + git commit -m "Add vendored file and update notices" --quiet + + local output exit_code=0 ok=true + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "10" "should exit 10 when candidates found" || ok=false + assert_global_metadata_prefix "$output" true true "NOTICES changed with candidate" || ok=false + local count + count=$(echo "$output" | grep -c "^file: " || true) + assert_eq "$count" "1" "THIRD_PARTY_NOTICES.md must not appear as a candidate" || ok=false + assert_eq "$(get_field "$output" "file")" "src/Vendored.java" "only the source file should be a candidate" || ok=false + [[ "$ok" == "true" ]] +} + +test_removed_sentry_copyright_not_flagged() { + mkdir -p src + cat > src/SentryFile.java << 'JAVA' +// Copyright 2025 Functional Software, Inc. +// Licensed under the Apache License, Version 2.0. +package com.example; +public class SentryFile {} +JAVA + git add src/SentryFile.java + git commit -m "Add file with Sentry copyright" --quiet + + setup_branch + cat > src/SentryFile.java << 'JAVA' +package com.example; +public class SentryFile {} +JAVA + git add src/SentryFile.java + git commit -m "Remove Sentry copyright" --quiet + + local output exit_code=0 + output=$(run_script) || exit_code=$? + assert_eq "$exit_code" "0" "removing Sentry-only copyright should not trigger attribution" || return 1 + assert_global_metadata_prefix "$output" true false "removed Sentry copyright" || return 1 + assert_line_count "$output" 2 "exit 0 should print metadata only" || return 1 +} + +# --- Run all tests --- + +run_test "Clean branch — no candidates" test_clean_branch_no_candidates +run_test "New file with attribution markers" test_new_file_with_attribution +run_test "New file in vendor path" test_new_file_in_vendor_path +run_test "New file under io/sentry/vendor" test_new_file_under_io_sentry_vendor_path +run_test "Deleted file with attribution" test_deleted_file_with_attribution +run_test "Modified file — attribution added" test_modified_file_attribution_added +run_test "Modified file — attribution removed" test_modified_file_attribution_removed +run_test "Renamed file — correct path" test_renamed_file_has_correct_path +run_test "Staged new file detected" test_staged_new_file_detected +run_test "Staged modification detected" test_staged_modification_detected +run_test "Untracked file detected" test_untracked_file_detected +run_test "THIRD_PARTY_NOTICES.md change — committed" test_notices_file_changed_true +run_test "THIRD_PARTY_NOTICES.md only — exit 10, no file blocks" test_notices_only_change_triggers_without_file_candidates +run_test "THIRD_PARTY_NOTICES.md only — unstaged, no file blocks" test_notices_only_unstaged_triggers_without_file_candidates +run_test "THIRD_PARTY_NOTICES.md deleted on branch" test_notices_deleted_on_branch +run_test "THIRD_PARTY_NOTICES.md change — staged" test_notices_staged_change_detected +run_test "THIRD_PARTY_NOTICES file not emitted as candidate" test_notices_file_not_emitted_as_candidate +run_test "Missing THIRD_PARTY_NOTICES.md" test_missing_notices_file +run_test "Excluded paths skipped" test_excluded_paths_skipped +run_test "Generated files skipped" test_generated_files_skipped +run_test "Binary file skipped" test_binary_file_skipped +run_test "Sentry copyright not flagged" test_sentry_copyright_not_flagged +run_test "Removed Sentry copyright — not flagged" test_removed_sentry_copyright_not_flagged +run_test "License-only header not flagged" test_license_only_header_not_flagged +run_test "Modified vendor-path file — attribution changed" test_modified_vendor_path_file_attribution_changed +run_test "Modified vendored file — no attribution change" test_modified_vendored_file_no_attribution_change +run_test "Modified first-party file — Sentry license header not flagged" test_modified_first_party_license_header_not_flagged +run_test "Unstaged modification detected" test_unstaged_modification_detected +run_test "Multiple candidates in single run" test_multiple_candidates +run_test "Merge-base failure" test_merge_base_failure +run_test "Renamed into vendor path" test_renamed_into_vendor_path +run_test "Renamed out of vendor path" test_renamed_out_of_vendor_path +run_test "Renamed with attribution stripped" test_renamed_with_attribution_stripped +run_test "Modified Sentry copyright year — not flagged" test_modified_sentry_copyright_year_bump_not_flagged +run_test "Dual copyright — Sentry and third-party" test_dual_copyright_sentry_and_third_party +run_test "Committed M + staged D resolves to D" test_committed_modified_then_staged_delete + +# --- Summary --- + +echo "" +echo "==============================" +echo "Tests run: $TESTS_RUN" +echo -e "Passed: ${GREEN}$TESTS_PASSED${NC}" +if [[ $TESTS_FAILED -gt 0 ]]; then + echo -e "Failed: ${RED}$TESTS_FAILED${NC}" + exit 1 +else + echo "Failed: 0" + echo -e "${GREEN}All tests passed.${NC}" +fi diff --git a/AGENTS.md b/AGENTS.md index 42a8e651004..a32e4b3a3c1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -145,15 +145,7 @@ The repository is organized into multiple modules: 5. Consider backwards compatibility ### Third-Party Code Attribution -When adapting code from third-party libraries: -1. Add a license header at the top of the adapted file (before the `package` statement): - ```java - // Adapted from . - // Copyright . - // Licensed under the . - // - ``` -2. Add a full attribution entry to `THIRD_PARTY_NOTICES.md` following the existing format (Source, License, Copyright, Scope, full license text) +See [`.claude/skills/check-code-attribution/CODE_ATTRIBUTION_CRITERIA.md`](.claude/skills/check-code-attribution/CODE_ATTRIBUTION_CRITERIA.md). ### Getting PR Information