Skip to content

Latest commit

 

History

History
184 lines (135 loc) · 3.85 KB

File metadata and controls

184 lines (135 loc) · 3.85 KB

Configuration Guide

This guide covers configuration options for the Visual Search system.

Environment Variables

Create a .env file or set environment variables:

# Storage
VISUAL_SEARCH_STORAGE_PATH=./data/images
VISUAL_SEARCH_INDEX_PATH=./data/index

# Model
VISUAL_SEARCH_MODEL_NAME=clip-ViT-B-32
VISUAL_SEARCH_EMBEDDING_DIM=512

# Performance
VISUAL_SEARCH_BATCH_SIZE=32
VISUAL_SEARCH_NUM_WORKERS=4

# Device (auto, cpu, cuda, mps)
VISUAL_SEARCH_DEVICE=auto

Component Configuration

Storage Backend

Local Storage

from visual_search.indexing.storage.local_storage import LocalStorage

storage = LocalStorage(
    root_path="./data/images",     # Base directory
    default_format="PNG",          # Default save format
    create_dirs=True,              # Create directories if missing
)

Supported formats:

  • PNG (lossless, default)
  • JPEG (lossy, smaller files)
  • WebP (modern, good compression)
  • BMP (uncompressed)
  • TIFF (high quality)

Embedding Service

from visual_search.prediction.embedding_service import EmbeddingService

embedding_service = EmbeddingService(
    model_name="clip-ViT-B-32",  # CLIP model variant
    device=None,                  # Auto-detect (cuda/mps/cpu)
)

Available models:

Model Dimension Speed Quality
clip-ViT-B-32 512 Fast Good
clip-ViT-B-16 512 Medium Better
clip-ViT-L-14 768 Slow Best

Vector Index

from visual_search.indexing.index_table import VectorIndex

index = VectorIndex(
    dimension=512,      # Must match embedding dimension
)

Index characteristics:

  • Uses FAISS IndexFlatL2
  • Exact search (no approximation)
  • L2 (Euclidean) distance metric

Reranker

from visual_search.prediction.reranking import Reranker

reranker = Reranker(
    max_distance=10.0,  # Maximum expected L2 distance
)

Reranking options:

Option Type Description
normalize bool Convert L2 to similarity [0,1]
min_score float Filter by minimum score
top_k int Limit number of results
diversity_threshold float Remove too-similar results

Image Preprocessing

The system applies CLIP-specific preprocessing:

from visual_search.prediction.preprocessing import ImagePreprocessor

preprocessor = ImagePreprocessor(
    target_size=(224, 224),  # CLIP input size
)

# Preprocessing steps:
# 1. Resize with center crop to 224x224
# 2. Convert to RGB
# 3. Normalize with CLIP mean/std

CLIP normalization values:

mean = [0.48145466, 0.4578275, 0.40821073]
std = [0.26862954, 0.26130258, 0.27577711]

Performance Tuning

Batch Processing

For indexing many images:

# Process in batches
BATCH_SIZE = 32

for i in range(0, len(images), BATCH_SIZE):
    batch = images[i:i + BATCH_SIZE]
    batch_ids = image_ids[i:i + BATCH_SIZE]
    indexing_service.index_batch(batch, batch_ids)

Memory Management

CLIP model uses ~400MB GPU memory. For CPU-only:

embedding_service = EmbeddingService(device="cpu")

Index Size

Approximate memory usage:

  • 512-dim float32 vectors
  • ~2KB per image
  • 1M images ≈ 2GB RAM

Logging

Configure logging for debugging:

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("visual_search")
logger.setLevel(logging.DEBUG)

Directory Structure

Recommended project structure:

project/
├── data/
│   ├── images/          # Raw image storage
│   ├── index/           # FAISS index files
│   │   ├── index.faiss
│   │   └── metadata.json
│   └── evaluation/      # Evaluation datasets
├── config/
│   └── settings.yaml    # Configuration file
├── logs/
│   └── visual_search.log
└── .env