Skip to content

support for open ai cache#6

Open
bixia wants to merge 2 commits intoBradMoonUESTC:mainfrom
bixia:support-openai-cache
Open

support for open ai cache#6
bixia wants to merge 2 commits intoBradMoonUESTC:mainfrom
bixia:support-openai-cache

Conversation

@bixia
Copy link
Copy Markdown

@bixia bixia commented Dec 24, 2024

Improve OpenAI API Response Caching System

Overview

This PR enhances the caching system for OpenAI API calls to improve performance and reduce API costs. The changes provide consistent caching behavior across different API endpoints (OpenAI, Azure, Claude) and request types (chat completions, embeddings).

Key Changes

Cache Implementation

  • Unified caching approach for all API endpoints (OpenAI, Azure, Claude)
  • Consistent cache key generation using request data
  • JSON serialization for embedding responses
  • Zero-vector fallback for embedding errors

Supported Endpoints

  • OpenAI chat completions
  • Azure OpenAI completions
  • Claude chat completions
  • OpenAI embeddings
  • Custom embeddings service

Error Handling

  • Improved error handling with fallback responses
  • Zero-vector returns for embedding failures
  • Cache miss handling with proper error messages

Benefits

  • Reduced API costs through efficient response caching
  • Improved performance for repeated queries
  • Consistent caching behavior across all endpoints
  • Better error recovery and fallback mechanisms

Testing

The changes have been tested with:

  • Standard OpenAI endpoints
  • Azure OpenAI endpoints
  • Claude API endpoints
  • Custom embedding services
  • Error scenarios and fallbacks

Usage Example

# The cache is automatically used for all API calls
response = common_ask(prompt)  # Will use cache if available

# Embeddings are also cached
embedding = common_get_embedding(text)  # Cached with proper JSON serialization

Notes

  • No database schema changes required
  • Backwards compatible with existing cache entries
  • Thread-safe implementation
  • Proper JSON serialization for embedding vectors

Related Issues

  • Reduces API costs through caching
  • Improves response times for repeated queries
  • Provides consistent behavior across different API endpoints

Future Improvements

  • Add cache expiration policies
  • Implement cache size limits
  • Add cache statistics tracking

@K2
Copy link
Copy Markdown
Collaborator

K2 commented Feb 23, 2025

@bixia Cool stuff! I'd love to merge something like I'm too new here to check this in just yet. I wrote a similar cache that will probably get you a limited amount of functionality in my dev fork, I have a bit of testing to finish up and will have the current stack + everything on async with some more formalized pipeline's that guide flow through the taskmgr/aimgr, this will let you play with scripts and make run-time testing/changes very easily, I hope to have some of the new user-defined graph queries in a couple days also.

I would ping @BradMoonUESTC for this one however. If you are in need of cache to help suspending-resuming of sessions (save cost too), the implementation I had done is in (https://github.com/K2/finite-monkey-engine/blob/dev/services/cache.py) it's a very simple python object cache over the translation engine, I hope I can get it working well with everything else there will be automatic CN/EN translation that I hope works well for some people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants