Skip to content

Refactor: Improve code quality and architecture across Lambda functions #247

@lakshayman

Description

@lakshayman

Issue Description

The identity-service Lambda functions have several code quality and architecture issues that impact maintainability, reliability, and production readiness. These issues include inconsistent error handling, logging inconsistencies, code duplication, resource management problems, and concurrency concerns that could lead to memory leaks or unexpected failures.

The proposed changes will:

  • Standardize error handling across all Lambda functions
  • Implement structured logging with JSON format for CloudWatch
  • Eliminate code duplication by extracting common utilities
  • Add proper resource management with timeouts and context cancellation
  • Fix concurrency issues in batch processing functions
  • Improve type safety by replacing type assertions with proper struct types

Expected Behaviour

  • All Lambda functions should handle errors gracefully without using log.Fatalf() in handlers
  • Structured JSON logging should be used consistently across all functions
  • Firestore client initialization should be centralized and reusable
  • HTTP clients should have proper timeouts and context support
  • Batch processing functions should use worker pools or semaphores to limit concurrent goroutines
  • Type assertions should be replaced with proper struct types using DataTo() method
  • All functions should have proper context timeouts for long-running operations

Current Behaviour

  • Error Handling: Functions use log.Fatalf() in handlers which can cause unexpected Lambda terminations. Error handling is inconsistent - some functions return errors as strings (Getdata() returns string instead of error)
  • Logging: Mixed usage of fmt.Println, log.Println, log.Fatalf, and custom logging with no structured format
  • Code Duplication: Firestore client initialization is duplicated across all functions. URL normalization logic is repeated in multiple places
  • Resource Management: Missing context timeouts in HTTP clients and Firestore operations. No request timeouts on HTTP clients (except health-check). Potential goroutine leaks if panics occur
  • Concurrency Issues: call-profiles/main.go creates unbounded goroutines without rate limiting. No graceful shutdown handling
  • Type Safety: Excessive type assertions (dsnap.Data()["field"].(string)) instead of using proper struct types

Reproducibility

This issue is reproducible

Examples:

  • Run any Lambda function and check CloudWatch logs for inconsistent log formats
  • Check call-profiles/main.go for unbounded goroutine creation
  • Review error handling in call-profile/main.go handler function
  • Check for missing HTTP client timeouts in layer/utils/firestore.go

Severity/Priority

🔴 High

Detailed Issues

1. Error Handling

  • Location: All Lambda handlers (call-profile/main.go, call-profiles/main.go, verify/main.go, health-check/main.go)
  • Issue: Using log.Fatalf() in Lambda handlers terminates the function
  • Impact: Can cause unexpected failures and failed invocations
  • Files Affected:
    • call-profile/main.go:125
    • call-profiles/main.go:36, 58, 77
    • verify/main.go:128
    • health-check/main.go:62, 84
    • layer/utils/firestore.go:50, 54

2. Logging Inconsistencies

  • Location: health-check/main.go:33, 48, 65
  • Issue: Using fmt.Println in production code instead of structured logging
  • Impact: Difficult to parse logs in CloudWatch, no correlation IDs
  • Files Affected:
    • health-check/main.go
    • Various handlers using mixed logging approaches

3. Code Duplication

  • Location: All Lambda main functions
  • Issue: Firestore client initialization duplicated, URL normalization logic repeated
  • Impact: Maintenance burden, potential for inconsistencies
  • Files Affected:
    • call-profile/main.go:122-131
    • call-profiles/main.go:73-85
    • verify/main.go:124-137
    • health-check/main.go:80-92

4. Resource Management

  • Location: layer/utils/firestore.go:115-194, call-profile/main.go:87-89
  • Issue: Missing context timeouts, inconsistent HTTP client timeouts
  • Impact: Potential for hanging requests, Lambda timeout issues
  • Files Affected:
    • layer/utils/firestore.go
    • call-profile/main.go
    • verify/main.go

5. Concurrency Issues

  • Location: call-profiles/main.go:40-70, health-check/main.go:52-72
  • Issue: Unbounded goroutine creation, no rate limiting, potential goroutine leaks
  • Impact: Memory exhaustion, Lambda timeouts, AWS account limits
  • Files Affected:
    • call-profiles/main.go
    • health-check/main.go

6. Type Safety

  • Location: call-profile/main.go:37-71, verify/main.go:229-269
  • Issue: Excessive type assertions instead of struct types
  • Impact: Runtime panics, difficult to catch type errors
  • Files Affected:
    • call-profile/main.go
    • layer/utils/firestore.go:229-271

Proposed Solutions

1. Error Handling Standardization

  • Create a common error handler wrapper
  • Replace log.Fatalf() with proper error returns
  • Use structured errors with context
  • Return proper HTTP status codes

2. Structured Logging

  • Create a logging utility with JSON format
  • Replace all fmt.Println, log.Println with structured logger
  • Add correlation IDs for request tracing
  • Use appropriate log levels (DEBUG, INFO, WARN, ERROR)

3. Code Deduplication

  • Create internal/client package for Firestore initialization
  • Extract URL utilities to layer/utils/url.go
  • Create shared handler wrapper for common setup

4. Resource Management

  • Add context timeouts to all HTTP operations
  • Standardize HTTP client creation with timeouts
  • Implement proper cleanup in defer statements
  • Use context for cancellation

5. Concurrency Control

  • Implement worker pool pattern for batch operations
  • Add semaphore/rate limiting for concurrent goroutines
  • Add graceful shutdown handling
  • Limit concurrent Lambda invocations

6. Type Safety

  • Create proper struct types for Firestore documents
  • Replace type assertions with DataTo() method calls
  • Add validation functions
  • Use type-safe constants

Additional Information

Testing Considerations

  • Need to add unit tests for error handling paths
  • Integration tests for concurrent operations
  • Load testing for batch processing functions

Breaking Changes

  • Error response formats may change (should maintain backward compatibility where possible)
  • Log format changes will affect existing log parsing tools

Dependencies

  • May require updates to logging libraries
  • Consider adding structured logging library (e.g., zerolog or zap)

Checklist

  • I have read and followed the project's code of conduct.
  • I have searched for similar issues before creating this one.
  • I have provided all the necessary information to understand and reproduce the issue.
  • I am willing to contribute to the resolution of this issue.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions