Skip to content

Conversation

@LifeJiggy
Copy link

Summary

This PR adds a RateLimiter utility class that implements token bucket rate limiting, helping developers manage API rate limits and avoid being throttled by the Gradient service.

Problem

Gradient API has rate limits that developers must respect to avoid being throttled or blocked. Currently, developers have no built-in way to manage request rates, leading to:

  • Unexpected throttling errors during high-traffic periods
  • Difficulty implementing proper rate limiting logic
  • Poor user experience when requests are rejected
  • Manual implementation of rate limiting across different parts of applications

Solution

Add RateLimiter class with token bucket algorithm:

  • Configurable requests per minute limit
  • Automatic token refill based on elapsed time
  • Simple API for checking if requests can be made
  • Wait time calculation for rate limit management
  • Thread-safe implementation using standard library

Key Features

  • Token Bucket Algorithm: Industry-standard rate limiting
  • Configurable Limits: Adjustable requests per minute
  • Automatic Refill: Tokens replenish over time
  • Wait Time Calculation: Know how long to wait for next request
  • Thread Safe: Uses standard library only, no external dependencies
  • Simple API: Easy to integrate into existing code

Benefits

  • Prevents API throttling errors
  • Smooths out request patterns
  • Improves application reliability
  • Helps stay within API quotas
  • Better user experience during high load

Testing

Added comprehensive test suite covering:

  • Basic rate limiting behavior
  • Token acquisition and exhaustion
  • Wait time calculations
  • Token refill over time
  • Custom rate limit configurations

All tests pass with full coverage of rate limiting functionality.

Usage Examples

from gradient._utils import RateLimiter
import time

# Create rate limiter for 30 requests per minute
limiter = RateLimiter(requests_per_minute=30)

# Before making API calls
if limiter.acquire():
    # Make API request
    response = client.chat.completions.create(...)
else:
    # Wait for tokens to be available
    wait_seconds = limiter.wait_time()
    time.sleep(wait_seconds)
    response = client.chat.completions.create(...)

# Or integrate into request loop
def make_rate_limited_request():
    while not limiter.acquire():
        wait_seconds = limiter.wait_time()
        time.sleep(wait_seconds)
    
    return client.chat.completions.create(...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant