Rate limit headers don't behave exactly the same as OpenAI's API, no time to add right now, good first PR or will get to it later.
From Azure:
MAIN | 2024-03-02 13:35:10 | HEADERS | {'cache-control': 'no-cache, must-revalidate', 'content-length': '831', 'content-type': 'application/json', 'access-control-allow-origin': '*', 'apim-request-id': 'bb943c68-99ed-46ab-bda9-a5efbdf5897a', 'strict-transport-security': 'max-age=31536000; includeSubDomains; preload', 'x-content-type-options': 'nosniff', 'x-ms-region': 'North Central US', 'x-ratelimit-remaining-requests': '79', 'x-ratelimit-remaining-tokens': '61353', 'x-accel-buffering': 'no', 'x-request-id': '3a8b1a44-a02c-4a07-b51d-1c88acec4c4f', 'x-ms-client-request-id': 'bb943c68-99ed-46ab-bda9-a5efbdf5897a', 'azureml-model-session': 'd008-20240215231538', 'date': 'Sat, 02 Mar 2024 21:35:09 GMT'}
Seems wholly reliant on x-ratelimit-remaining-requests, where the library relies on OpenAI's x-ratelimit-limit-tokens headers, since their remaining-requests and remaining-tokens values are cached for long periods and do not update.
Rate limit headers don't behave exactly the same as OpenAI's API, no time to add right now, good first PR or will get to it later.
From Azure:
Seems wholly reliant on
x-ratelimit-remaining-requests, where the library relies on OpenAI'sx-ratelimit-limit-tokensheaders, since theirremaining-requestsandremaining-tokensvalues are cached for long periods and do not update.