This guide covers configuration options for the Visual Search system.
Create a .env file or set environment variables:
# Storage
VISUAL_SEARCH_STORAGE_PATH=./data/images
VISUAL_SEARCH_INDEX_PATH=./data/index
# Model
VISUAL_SEARCH_MODEL_NAME=clip-ViT-B-32
VISUAL_SEARCH_EMBEDDING_DIM=512
# Performance
VISUAL_SEARCH_BATCH_SIZE=32
VISUAL_SEARCH_NUM_WORKERS=4
# Device (auto, cpu, cuda, mps)
VISUAL_SEARCH_DEVICE=autofrom visual_search.indexing.storage.local_storage import LocalStorage
storage = LocalStorage(
root_path="./data/images", # Base directory
default_format="PNG", # Default save format
create_dirs=True, # Create directories if missing
)Supported formats:
- PNG (lossless, default)
- JPEG (lossy, smaller files)
- WebP (modern, good compression)
- BMP (uncompressed)
- TIFF (high quality)
from visual_search.prediction.embedding_service import EmbeddingService
embedding_service = EmbeddingService(
model_name="clip-ViT-B-32", # CLIP model variant
device=None, # Auto-detect (cuda/mps/cpu)
)Available models:
| Model | Dimension | Speed | Quality |
|---|---|---|---|
clip-ViT-B-32 |
512 | Fast | Good |
clip-ViT-B-16 |
512 | Medium | Better |
clip-ViT-L-14 |
768 | Slow | Best |
from visual_search.indexing.index_table import VectorIndex
index = VectorIndex(
dimension=512, # Must match embedding dimension
)Index characteristics:
- Uses FAISS IndexFlatL2
- Exact search (no approximation)
- L2 (Euclidean) distance metric
from visual_search.prediction.reranking import Reranker
reranker = Reranker(
max_distance=10.0, # Maximum expected L2 distance
)Reranking options:
| Option | Type | Description |
|---|---|---|
normalize |
bool | Convert L2 to similarity [0,1] |
min_score |
float | Filter by minimum score |
top_k |
int | Limit number of results |
diversity_threshold |
float | Remove too-similar results |
The system applies CLIP-specific preprocessing:
from visual_search.prediction.preprocessing import ImagePreprocessor
preprocessor = ImagePreprocessor(
target_size=(224, 224), # CLIP input size
)
# Preprocessing steps:
# 1. Resize with center crop to 224x224
# 2. Convert to RGB
# 3. Normalize with CLIP mean/stdCLIP normalization values:
mean = [0.48145466, 0.4578275, 0.40821073]
std = [0.26862954, 0.26130258, 0.27577711]For indexing many images:
# Process in batches
BATCH_SIZE = 32
for i in range(0, len(images), BATCH_SIZE):
batch = images[i:i + BATCH_SIZE]
batch_ids = image_ids[i:i + BATCH_SIZE]
indexing_service.index_batch(batch, batch_ids)CLIP model uses ~400MB GPU memory. For CPU-only:
embedding_service = EmbeddingService(device="cpu")Approximate memory usage:
- 512-dim float32 vectors
- ~2KB per image
- 1M images ≈ 2GB RAM
Configure logging for debugging:
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("visual_search")
logger.setLevel(logging.DEBUG)Recommended project structure:
project/
├── data/
│ ├── images/ # Raw image storage
│ ├── index/ # FAISS index files
│ │ ├── index.faiss
│ │ └── metadata.json
│ └── evaluation/ # Evaluation datasets
├── config/
│ └── settings.yaml # Configuration file
├── logs/
│ └── visual_search.log
└── .env