Skip to content

Conversation

Mearman
Copy link
Member

@Mearman Mearman commented Jul 30, 2025

Summary

Implements comprehensive authentication detection and handling for external link validation. This feature distinguishes between truly broken links and links that require authentication, addressing false positives from Firebase Console, GitHub private repos, and other auth-protected URLs.

Key Features

  • Smart Auth Detection: Recognizes auth-protected domains, status codes, and redirect patterns
  • Credential Support: API keys and custom headers for authenticated validation
  • Enhanced CLI: New options for auth detection with detailed output formatting
  • Comprehensive Testing: 65 test cases covering unit, integration, and end-to-end scenarios

Implementation Highlights

  • AuthDetector Class: Core authentication detection with configurable patterns and providers
  • LinkValidator Integration: Seamless integration with existing validation workflow
  • CLI Enhancement: New authentication options with proper TypeScript typing (no any types)
  • Statistics Tracking: Separate counting of auth-required vs truly broken links

Example Usage

# Enable authentication detection
markmv validate docs/ --check-external --enable-auth-detection

# Provide API credentials for validation
markmv validate docs/ --check-external --enable-auth-detection \
  --auth-credentials '{"api.github.com":"Bearer token123"}'

# Strict mode: treat auth-required as broken
markmv validate docs/ --check-external --enable-auth-detection --disallow-auth-required

Output Enhancement

📊 Validation Summary
Files processed: 5
Total links found: 42
Broken links: 3
🔒 Authentication-protected links: 2
❌ Truly broken links: 1

🔗 Broken Links Found:

  📄 docs/firebase.md (2 broken):
    ❌ [external] https://console.firebase.google.com/project/test 🔒
       Auth: URL appears to be authentication-protected (Firebase)
       Suggestion: This is likely working correctly but requires authentication to access
    ❌ [external] https://example.com/missing
       Reason: external-error

Domain Pattern Support

  • Firebase Console: console.firebase.google.com
  • GitHub Settings: github.com/*/settings
  • Google Cloud Console: console.cloud.google.com
  • AWS Console: console.aws.amazon.com
  • Microsoft 365: *.sharepoint.com, *.onedrive.com
  • And many more with wildcard support

Technical Details

  • Domain-based Detection: Immediate recognition of known auth-protected domains
  • Status-based Detection: 401/403 responses identified as authentication requirements
  • Redirect-based Detection: Follows redirects to detect auth pages (Google, Microsoft, etc.)
  • Provider Recognition: Automatic detection of auth providers for better messaging
  • Credential Management: Support for domain-specific API keys and custom headers

Resolves #34

Test Plan

  • All existing tests pass (675 tests)
  • New comprehensive test suites for authentication detection (65 tests total)
  • Unit tests for AuthDetector functionality (35 tests)
  • Integration tests for LinkValidator integration (19 tests)
  • End-to-end tests for complete validation pipeline (11 tests)
  • CLI integration testing with various authentication options
  • Cross-platform compatibility verification
  • No any types used - full TypeScript type safety maintained

Implements comprehensive authentication detection and handling for external link validation.
This feature distinguishes between truly broken links and links that require authentication,
addressing false positives from Firebase Console, GitHub private repos, and other auth-protected URLs.

## Core Features

### Authentication Detection System
- **AuthDetector class**: Core authentication detection with configurable patterns and thresholds
- **Domain-based detection**: Recognizes auth-protected domains (Firebase, GitHub, AWS, etc.)
- **Status-based detection**: Identifies 401/403 responses as authentication requirements
- **Redirect-based detection**: Detects redirects to authentication pages (Google, Microsoft, etc.)
- **Pattern-based detection**: Recognizes content with staleness indicators

### Enhanced LinkValidator Integration
- **Seamless integration**: Authentication detection integrated into existing validation workflow
- **Credential support**: API keys and custom headers for authenticated validation
- **Smart request handling**: Uses GET requests when authentication is enabled for content analysis
- **Error differentiation**: Distinguishes auth-required from truly broken links

### CLI Integration
- **--enable-auth-detection**: Enable authentication-aware validation
- **--disallow-auth-required**: Treat auth-required links as broken (strict mode)
- **--auth-credentials**: JSON object with domain:credential mapping
- **--auth-headers**: JSON object with domain-specific headers
- **Enhanced output**: Shows auth-protected links with 🔒 indicator and detailed information

### Configuration & Flexibility
- **Configurable patterns**: Domain patterns and redirect patterns for different auth providers
- **Credential management**: Support for API keys, bearer tokens, and custom headers
- **Provider detection**: Automatic detection of auth providers (Google, GitHub, Microsoft, etc.)
- **Statistics tracking**: Separate counting of auth-required vs truly broken links

## Implementation Details

### New Files
- `src/utils/auth-detection.ts`: Core authentication detection utilities
- `src/utils/auth-detection.test.ts`: Comprehensive unit tests (35 test cases)
- `src/core/link-validator-auth.test.ts`: Integration tests (19 test cases)
- `src/commands/validate-auth.test.ts`: End-to-end tests (11 test cases)

### Enhanced Files
- `src/core/link-validator.ts`: Integrated authentication detection
- `src/commands/validate.ts`: Added auth options and statistics
- `src/types/config.ts`: Extended BrokenLink interface with auth info
- `src/cli.ts`: Added authentication CLI options with proper typing

### Key Patterns Supported
- Firebase Console: `console.firebase.google.com`
- GitHub Settings: `github.com/*/settings`
- Google Cloud Console: `console.cloud.google.com`
- AWS Console: `console.aws.amazon.com`
- Microsoft 365: `*.sharepoint.com`, `*.onedrive.com`
- Many others with wildcard support

## Example Usage

```bash
# Enable auth detection with default settings
markmv validate docs/ --check-external --enable-auth-detection

# Provide API credentials for validation
markmv validate docs/ --check-external --enable-auth-detection \
  --auth-credentials '{"api.github.com":"Bearer token123"}'

# Use custom headers for specific domains
markmv validate docs/ --check-external --enable-auth-detection \
  --auth-headers '{"api.example.com":{"X-API-Key":"key123"}}'

# Strict mode: treat auth-required as broken
markmv validate docs/ --check-external --enable-auth-detection --disallow-auth-required
```

## Test Coverage
- **Unit tests**: 35 tests for AuthDetector functionality
- **Integration tests**: 19 tests for LinkValidator integration
- **End-to-end tests**: 11 tests for complete validation pipeline
- **Total coverage**: 65 authentication-related test cases

Resolves #34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Authentication-aware link validation

1 participant