Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,5 @@ graph TD
A reference implementation of how to integrate the `kvcache.Indexer` into a scheduler like the `llm-d-inference-scheduler`
* [**KV-Events**](examples/kv_events/README.md):
Demonstrates how the KV-Cache Manager handles KV-Events through both an offline example with a dummy ZMQ publisher and an online example using a vLLM Helm chart.
* [**Valkey Backend**](examples/valkey_example/README.md):
Shows how to configure and use Valkey as the backend for KV-block indexing, including RDMA support for high-performance scenarios.
1 change: 1 addition & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ The `kvblock.Index` is an interface with swappable backends.
* **In-Memory (Default)**: A very fast, thread-safe, two-level LRU cache using `hashicorp/golang-lru`. The first level maps a block key to a second-level cache of pods that have the block. It prioritizes speed over persistence, which is usually the right trade-off for ephemeral cache data.
* **Cost-Aware Memory (Optional)**: A memory-efficient implementation using the `hypermodeinc/ristretto` cache library that provides cost-aware eviction based on actual memory usage. Unlike the basic in-memory backend, this implementation calculates the memory footprint of each cache entry and uses this information for intelligent eviction decisions. This is particularly useful when memory usage patterns vary significantly across different keys.
* **Redis (Optional)**: A distributed backend that can be shared by multiple indexer replicas. It can offer scalability and persistence, but this may be overkill given the short lifetime of most KV-cache blocks.
* **Valkey (Optional)**: A Redis-compatible, open-source alternative that provides the same distributed capabilities as Redis but remains under the BSD license. Valkey offers additional performance benefits through RDMA support for reduced latency, making it particularly suitable for high-performance LLM inference workloads. Since Valkey is API-compatible with Redis, it can be used as a drop-in replacement.

#### Tokenization Caching Process

Expand Down
22 changes: 22 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,28 @@ Configures the Redis-backed KV block index implementation.
| Field | Type | Description | Default |
|-------|------|-------------|---------|
| `address` | `string` | Redis server address (can include auth: `redis://user:pass@host:port/db`) | `"redis://127.0.0.1:6379"` |
| `backendType` | `string` | Backend type: "redis" or "valkey" (optional, mainly for documentation) | `"redis"` |
| `enableRDMA` | `boolean` | Enable RDMA transport for Valkey (experimental, requires Valkey with RDMA support) | `false` |

### Valkey Index Configuration (`RedisIndexConfig`)

Configures the Valkey-backed KV block index implementation. Valkey is a Redis-compatible, open-source alternative that supports RDMA for improved latency.

```json
{
"address": "valkey://127.0.0.1:6379",
"backendType": "valkey",
"enableRDMA": false
}
```

| Field | Type | Description | Default |
|-------|------|-------------|---------|
| `address` | `string` | Valkey server address. Supports `valkey://`, `valkeys://` (SSL), `redis://`, or plain address | `"valkey://127.0.0.1:6379"` |
| `backendType` | `string` | Should be "valkey" for Valkey instances | `"valkey"` |
| `enableRDMA` | `boolean` | Enable RDMA transport (requires Valkey server with RDMA support) | `false` |

**Note**: Both Redis and Valkey configurations use the same `RedisIndexConfig` structure since Valkey is API-compatible with Redis.

## Token Processing Configuration

Expand Down
100 changes: 100 additions & 0 deletions examples/valkey_configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Valkey Configuration Example for KV-Cache Manager

This example demonstrates how to configure the KV-Cache Manager to use Valkey as the backend for KV-block indexing.

## Basic Valkey Configuration

```json
{
"kvBlockIndexConfig": {
"valkeyConfig": {
"address": "valkey://127.0.0.1:6379",
"backendType": "valkey",
"enableRDMA": false
},
"enableMetrics": true,
"metricsLoggingInterval": "30s"
}
}
```

## Valkey with RDMA Support

```json
{
"kvBlockIndexConfig": {
"valkeyConfig": {
"address": "valkey://valkey-server:6379",
"backendType": "valkey",
Copy link

Copilot AI Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra trailing space after comma should be removed.

Suggested change
"backendType": "valkey",
"backendType": "valkey",

Copilot uses AI. Check for mistakes.

"enableRDMA": true
},
"enableMetrics": true
}
}
```

## Valkey with SSL/TLS

```json
{
"kvBlockIndexConfig": {
"valkeyConfig": {
"address": "valkeys://valkey-cluster:6380",
"backendType": "valkey",
"enableRDMA": false
}
}
}
```

## Environment Variables

You can also configure Valkey using environment variables:

```bash
export VALKEY_ADDR="valkey://127.0.0.1:6379"
export VALKEY_ENABLE_RDMA="false"
```

## Migration from Redis

To migrate from Redis to Valkey, simply change the configuration:

**Before (Redis):**
```json
{
"kvBlockIndexConfig": {
"redisConfig": {
"address": "redis://127.0.0.1:6379"
}
}
}
```

**After (Valkey):**
```json
{
"kvBlockIndexConfig": {
"valkeyConfig": {
"address": "valkey://127.0.0.1:6379",
"backendType": "valkey"
}
}
}
```

## Benefits of Using Valkey

1. **Open Source**: Valkey remains under the BSD license, ensuring long-term availability
2. **Redis Compatibility**: Drop-in replacement for Redis with full API compatibility
3. **RDMA Support**: Lower latency networking for high-performance workloads
4. **Community Backed**: Supported by major cloud vendors and the Linux Foundation
5. **Performance**: Optimizations specifically for modern hardware

## RDMA Configuration Notes

When `enableRDMA: true` is set:
- Ensure your Valkey server is compiled with RDMA support
- Verify that RDMA hardware and drivers are properly configured
- Note that RDMA support in the Go client is experimental
- The connection will fall back to standard TCP if RDMA is not available
138 changes: 138 additions & 0 deletions examples/valkey_example/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Valkey Example for KV-Cache Manager

This example demonstrates how to use Valkey as the backend for the KV-Cache Manager's KV-block indexing system.

## What is Valkey?

Valkey is a community-forked version of Redis that remains under the original BSD license. It's fully API-compatible with Redis and offers additional features like RDMA support for improved latency in high-performance scenarios.

## Benefits of Using Valkey

- **Open Source**: Remains under the BSD license
- **Redis Compatibility**: Drop-in replacement for Redis
- **RDMA Support**: Lower latency networking for high-performance workloads
- **Community Backed**: Supported by major cloud vendors and Linux Foundation
- **Performance**: Optimizations for modern hardware

## Prerequisites

1. **Valkey Server**: Install and run a Valkey server
```bash
# Using Docker
docker run -d -p 6379:6379 valkey/valkey:latest

# Or install from source/package manager
```

2. **Go Environment**: Go 1.24.1 or later

3. **Optional**: Hugging Face token for tokenizer access
```bash
export HF_TOKEN="your-huggingface-token"
```

## Running the Example

### Basic Usage

```bash
# Run with default Valkey configuration
go run main.go

# Run with custom Valkey address
VALKEY_ADDR="valkey://your-valkey-server:6379" go run main.go
```

### With RDMA Support

If your Valkey server supports RDMA:

```bash
VALKEY_ADDR="valkey://rdma-valkey-server:6379" \
VALKEY_ENABLE_RDMA="true" \
go run main.go
```

### Environment Variables

- `VALKEY_ADDR`: Valkey server address (default: `valkey://127.0.0.1:6379`)
- `VALKEY_ENABLE_RDMA`: Enable RDMA transport (default: `false`)
- `HF_TOKEN`: Hugging Face token for tokenizer access (optional)

## What the Example Does

1. **Configuration**: Sets up a KV-Cache Manager with Valkey backend
2. **Cache Operations**: Demonstrates adding prompts to the cache
3. **Cache Hits**: Shows how repeated prompts result in cache hits
4. **Multi-Pod Lookup**: Demonstrates cache sharing across multiple pods
5. **Metrics**: Enables metrics collection for monitoring cache performance

## Expected Output

```
I0104 10:30:00.123456 1 main.go:45] Initializing KV-Cache Manager with Valkey backend valkeyAddr="valkey://127.0.0.1:6379" rdmaEnabled=false
I0104 10:30:00.234567 1 main.go:109] Processing prompt iteration=1 prompt="Hello, how are you today?"
I0104 10:30:00.345678 1 main.go:122] Cache score prompt="Hello, how are you today?" score=1.0 podID="demo-pod-1"
I0104 10:30:00.456789 1 main.go:109] Processing prompt iteration=3 prompt="Hello, how are you today?"
I0104 10:30:00.567890 1 main.go:122] Cache score prompt="Hello, how are you today?" score=1.0 podID="demo-pod-1"
...
I0104 10:30:02.123456 1 main.go:65] Valkey example completed successfully
```

## Comparison with Redis

The Valkey backend is API-compatible with Redis, so you can easily switch between them:

### Redis Configuration
```json
{
"kvBlockIndexConfig": {
"redisConfig": {
"address": "redis://127.0.0.1:6379"
}
}
}
```

### Valkey Configuration
```json
{
"kvBlockIndexConfig": {
"valkeyConfig": {
"address": "valkey://127.0.0.1:6379",
"backendType": "valkey",
"enableRDMA": false
}
}
}
```

## Performance Considerations

- **RDMA**: Enable RDMA for ultra-low latency if your infrastructure supports it
- **Connection Pooling**: The underlying Redis client handles connection pooling
- **Persistence**: Valkey data persists across restarts (unlike in-memory backends)
- **Scalability**: Suitable for distributed deployments with multiple indexer replicas

## Troubleshooting

### Connection Issues
- Ensure Valkey server is running and accessible
- Check network connectivity and firewall rules
- Verify the address format (supports `valkey://`, `redis://`, or plain addresses)

### RDMA Issues
- Confirm Valkey server is compiled with RDMA support
- Verify RDMA hardware and drivers are properly configured
- Check that both client and server are on RDMA-enabled networks

### Performance Issues
- Monitor cache hit rates using the built-in metrics
- Adjust block size in TokenProcessorConfig for your use case
- Consider using multiple Valkey instances for horizontal scaling

## See Also

- [Valkey Configuration Guide](../valkey_configuration.md)
- [KV-Cache Manager Architecture](../../docs/architecture.md)
- [Configuration Reference](../../docs/configuration.md)
Loading