Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 36 additions & 5 deletions shared/utils/crypto.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,14 @@ def _get_encryption_key():
aes_iv = os.environ.get('GIT_TOKEN_AES_IV', '1234567890123456')
_aes_key = aes_key.encode('utf-8')
_aes_iv = aes_iv.encode('utf-8')
logger.info("Loaded encryption keys from environment variables")
# Debug logging: print key info for troubleshooting encryption mismatch
logger.info(
f"Loaded encryption keys from environment variables - "
f"GIT_TOKEN_AES_KEY: first 4 chars='{aes_key[:4]}', last 4 chars='{aes_key[-4:]}', length={len(aes_key)}"
)
logger.info(
f"GIT_TOKEN_AES_IV: first 4 chars='{aes_iv[:4]}', last 4 chars='{aes_iv[-4:]}', length={len(aes_iv)}"
)
Comment on lines +34 to +41
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Never log encryption key material, even partially.

Logging any portion of encryption keys (first/last characters) is a severe security vulnerability that:

  • Aids cryptanalysis and reduces effective key entropy
  • Violates cryptographic security principles and compliance standards (PCI-DSS, NIST, SOC 2)
  • Exposes secrets if logs are compromised, aggregated to external services, or accessed by unauthorized personnel
  • Creates an audit trail of sensitive cryptographic material

For troubleshooting key mismatches between services, use these safer alternatives:

  • Hash comparison: Log sha256(key) instead of partial key content—services can verify matching hashes without exposing key material
  • Key rotation events: Log when keys are rotated or updated (timestamp only)
  • Environment validation: Log boolean flags indicating whether keys are loaded from expected sources
  • Checksum/version tags: Use non-reversible identifiers
🔎 Safer alternative using hash-based verification
+import hashlib
+
 def _get_encryption_key():
     """Get or initialize encryption key and IV from environment variables"""
     global _aes_key, _aes_iv
     if _aes_key is None:
         # Load keys from environment variables
         aes_key = os.environ.get('GIT_TOKEN_AES_KEY', '12345678901234567890123456789012')
         aes_iv = os.environ.get('GIT_TOKEN_AES_IV', '1234567890123456')
         _aes_key = aes_key.encode('utf-8')
         _aes_iv = aes_iv.encode('utf-8')
-        # Debug logging: print key info for troubleshooting encryption mismatch
+        # Log hash for key verification (safe for troubleshooting)
+        key_hash = hashlib.sha256(aes_key.encode('utf-8')).hexdigest()[:16]
+        iv_hash = hashlib.sha256(aes_iv.encode('utf-8')).hexdigest()[:16]
         logger.info(
-            f"Loaded encryption keys from environment variables - "
-            f"GIT_TOKEN_AES_KEY: first 4 chars='{aes_key[:4]}', last 4 chars='{aes_key[-4:]}', length={len(aes_key)}"
+            f"Loaded encryption keys - "
+            f"GIT_TOKEN_AES_KEY hash: {key_hash}, length={len(aes_key)}"
         )
         logger.info(
-            f"GIT_TOKEN_AES_IV: first 4 chars='{aes_iv[:4]}', last 4 chars='{aes_iv[-4:]}', length={len(aes_iv)}"
+            f"GIT_TOKEN_AES_IV hash: {iv_hash}, length={len(aes_iv)}"
         )
     return _aes_key, _aes_iv
🤖 Prompt for AI Agents
In @shared/utils/crypto.py around lines 34-41, The current logger.info calls
expose partial AES key/IV material (aes_key, aes_iv) which is a security risk;
remove those prints and instead log non-sensitive verification data: compute and
log a SHA-256 (or other secure) hash of aes_key and aes_iv (or a combined
HMAC/checksum) and/or a boolean indicating successful load and source, and emit
a timestamped key-rotation or load event; update the code around the logger.info
calls referencing aes_key and aes_iv to log only the hash values or flags and
never raw or partial secret material.

return _aes_key, _aes_iv


Expand All @@ -42,7 +49,7 @@ def _get_encryption_key():
def encrypt_sensitive_data(plain_text: str) -> str:
"""
Encrypt sensitive data using AES-256-CBC encryption

This is the core encryption function used by all sensitive data encryption.

Args:
Expand All @@ -57,6 +64,12 @@ def encrypt_sensitive_data(plain_text: str) -> str:
if plain_text == "***":
return "***"

# Debug logging: print plain text info
logger.info(
f"encrypt_sensitive_data called - plain_text length={len(plain_text)}, "
f"first 4 chars='{plain_text[:4] if len(plain_text) >= 4 else plain_text}'"
)
Comment on lines +67 to +71
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Never log sensitive plaintext data (git tokens, API keys).

This function encrypts git tokens and API keys—logging the first 4 characters of plaintext secrets is a severe security vulnerability:

  • Exposes partial credentials in logs accessible to developers, operators, log aggregation services, and backups
  • Violates the security requirement that sensitive data should only exist encrypted in persistent storage
  • Creates a breach vector if logs are compromised
  • Non-compliant with PCI-DSS, GDPR (secrets as PII), and SOC 2 controls

For debugging encryption issues, log only non-sensitive metadata:

  • Data length and type (already logged)
  • Success/failure status
  • Timing information
  • Hash of the input (if absolutely necessary for correlation)

Never log actual secret content, even partially.

🔎 Safer logging approach
-    # Debug logging: print plain text info
-    logger.info(
-        f"encrypt_sensitive_data called - plain_text length={len(plain_text)}, "
-        f"first 4 chars='{plain_text[:4] if len(plain_text) >= 4 else plain_text}'"
+    # Log encryption attempt with non-sensitive metadata only
+    logger.debug(
+        f"encrypt_sensitive_data called - input length={len(plain_text)}"
     )
🤖 Prompt for AI Agents
In @shared/utils/crypto.py around lines 67-71, The logger call inside
encrypt_sensitive_data currently prints plaintext characters (first 4 chars),
which must be removed; update the logger.info in encrypt_sensitive_data to never
include any portion of plain_text and instead log only non-sensitive metadata
such as length, type, and/or a safe SHA-256 hex digest for correlation (compute
hash from plain_text using hashlib.sha256 and log the hex digest), and ensure
any exception paths also avoid printing plaintext.


try:
aes_key, aes_iv = _get_encryption_key()

Expand All @@ -76,7 +89,13 @@ def encrypt_sensitive_data(plain_text: str) -> str:
encrypted_bytes = encryptor.update(padded_data) + encryptor.finalize()

# Return base64 encoded encrypted data
return base64.b64encode(encrypted_bytes).decode('utf-8')
encrypted_str = base64.b64encode(encrypted_bytes).decode('utf-8')
# Debug logging: print encryption success info
logger.info(
f"encrypt_sensitive_data success - encrypted length={len(encrypted_str)}, "
f"first 10 chars='{encrypted_str[:10]}'"
)
return encrypted_str
Comment on lines +92 to +98
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use logger.debug() for debug logging instead of logger.info().

Debug information should use logger.debug() level to prevent it from appearing in production logs by default. The info level is intended for significant operational events, not verbose debugging output.

Additionally, logging partial ciphertext samples, while less critical than logging plaintext, can still provide information useful for cryptanalysis. Consider logging only length and success status.

🔎 Recommended changes
     # Return base64 encoded encrypted data
     encrypted_str = base64.b64encode(encrypted_bytes).decode('utf-8')
-    # Debug logging: print encryption success info
-    logger.info(
-        f"encrypt_sensitive_data success - encrypted length={len(encrypted_str)}, "
-        f"first 10 chars='{encrypted_str[:10]}'"
+    # Log encryption success with metadata only
+    logger.debug(
+        f"encrypt_sensitive_data success - output length={len(encrypted_str)}"
     )
     return encrypted_str
-except Exception as e:
+except Exception as e:
     logger.error(f"Failed to encrypt sensitive data: {str(e)}")
     raise

Note: The static analysis tool (Ruff TRY300) suggests moving the return to an else block, but the current structure is acceptable for this use case.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.10)

98-98: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
In @shared/utils/crypto.py around lines 92-98, The log in encrypt_sensitive_data
currently uses logger.info and prints a ciphertext sample; change this to
logger.debug and stop logging partial ciphertext to avoid leaking data — log
only a success message and the encrypted length (e.g.,
logger.debug(f"encrypt_sensitive_data success - encrypted
length={len(encrypted_str)}")) and remove the substring sample; keep the return
encrypted_str as-is.

except Exception as e:
logger.error(f"Failed to encrypt sensitive data: {str(e)}")
raise
Expand All @@ -85,7 +104,7 @@ def encrypt_sensitive_data(plain_text: str) -> str:
def decrypt_sensitive_data(encrypted_text: str) -> Optional[str]:
"""
Decrypt sensitive data using AES-256-CBC decryption

This is the core decryption function used by all sensitive data decryption.

Args:
Expand All @@ -100,6 +119,12 @@ def decrypt_sensitive_data(encrypted_text: str) -> Optional[str]:
if encrypted_text == "***":
return "***"

# Debug logging: print encrypted text info
logger.info(
f"decrypt_sensitive_data called - encrypted_text length={len(encrypted_text)}, "
f"first 10 chars='{encrypted_text[:10] if len(encrypted_text) >= 10 else encrypted_text}'"
)
Comment on lines +122 to +126
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use logger.debug() for debug logging and avoid logging ciphertext samples.

Same issues as the encryption logging:

  • Use logger.debug() instead of logger.info() for debug-only information
  • Avoid logging partial ciphertext samples; log only non-sensitive metadata (length, status)
🔎 Recommended changes
-    # Debug logging: print encrypted text info
-    logger.info(
-        f"decrypt_sensitive_data called - encrypted_text length={len(encrypted_text)}, "
-        f"first 10 chars='{encrypted_text[:10] if len(encrypted_text) >= 10 else encrypted_text}'"
+    # Log decryption attempt with metadata only
+    logger.debug(
+        f"decrypt_sensitive_data called - input length={len(encrypted_text)}"
     )
🤖 Prompt for AI Agents
In @shared/utils/crypto.py around lines 122-126, The logging inside
decrypt_sensitive_data currently uses logger.info and prints a partial
ciphertext sample; change the call to logger.debug and remove any
plaintext/ciphertext snippets—log only non-sensitive metadata such as the
encrypted_text length and a safe status message. Locate the logger call that
references encrypted_text and replace it with a logger.debug that reports length
and status (no substrings or content of encrypted_text).


try:
aes_key, aes_iv = _get_encryption_key()

Expand All @@ -122,7 +147,13 @@ def decrypt_sensitive_data(encrypted_text: str) -> Optional[str]:
decrypted_bytes = unpadder.update(decrypted_padded_bytes) + unpadder.finalize()

# Return decrypted string
return decrypted_bytes.decode('utf-8')
decrypted_str = decrypted_bytes.decode('utf-8')
# Debug logging: print decryption success info
logger.info(
f"decrypt_sensitive_data success - decrypted length={len(decrypted_str)}, "
f"first 4 chars='{decrypted_str[:4] if len(decrypted_str) >= 4 else decrypted_str}'"
)
return decrypted_str
Comment on lines +150 to +156
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Never log decrypted sensitive data.

This is identical to the pre-encryption logging issue—logging the first 4 characters of decrypted plaintext exposes git tokens and API keys in logs. This completely undermines the encryption security model.

After decryption, log only success status and metadata, never the actual decrypted content.

🔎 Safer logging approach
     # Return decrypted string
     decrypted_str = decrypted_bytes.decode('utf-8')
-    # Debug logging: print decryption success info
-    logger.info(
-        f"decrypt_sensitive_data success - decrypted length={len(decrypted_str)}, "
-        f"first 4 chars='{decrypted_str[:4] if len(decrypted_str) >= 4 else decrypted_str}'"
+    # Log decryption success with non-sensitive metadata only
+    logger.debug(
+        f"decrypt_sensitive_data success - output length={len(decrypted_str)}"
     )
     return decrypted_str
-except Exception as e:
+except Exception as e:
     logger.warning(f"Failed to decrypt sensitive data: {str(e)}")
     # Return the original text as fallback for backward compatibility
     return encrypted_text
🧰 Tools
🪛 Ruff (0.14.10)

156-156: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
In @shared/utils/crypto.py around lines 150-156, The logger.info call in
shared/utils/crypto.py currently logs parts of the decrypted plaintext (variable
decrypted_str) which leaks secrets; remove any logging of decrypted content
(including the first 4 characters) from the decryption path (e.g., inside
decrypt_sensitive_data or where decrypted_str is produced) and replace it with a
non-sensitive success log that only records operation status and safe metadata
(e.g., "decryption succeeded", length as an integer without printing content, or
a boolean), ensuring logger.info no longer references decrypted_str or any
substring of it.

except Exception as e:
logger.warning(f"Failed to decrypt sensitive data: {str(e)}")
# Return the original text as fallback for backward compatibility
Expand Down