diff --git a/README.md b/README.md index 47e6f19..40f17ef 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ Performance is fundamental to vector database utility, directly impacting user e | **Marqo** | 72.11ms P50, 140ms P99 (V2) | 157.7 QPS | Fast (Vespa backend) | 97% (V2) | Multi‑billion | GPU support | | **TypeSense** | <50ms (lexical) | Moderate | Real-time | Competitive | Millions‑Billions | Optional GPU | | **OpenSearch** | 10s+ to <200ms* | 16‑147 QPS | 9.5x faster (v3.0 GPU) | 87.9% | Billions+ | GPU acceleration (v3.0) | -| **Weaviate** | <200ms | 15 QPS | Moderate | 80.6% | Billions+ | Modular processing | +| **Weaviate** | ~30–150ms | 300–1500 QPS | Moderate | 85–97%+ | Billions+ | Modular index + compression | *OpenSearch: Highly variable performance - requires significant tuning, can achieve A-grade with proper configuration @@ -50,6 +50,8 @@ Performance is fundamental to vector database utility, directly impacting user e - **Qdrant**: 626.5 QPS with excellent recall - **Marqo V2**: 157.7 QPS - **Pinecone Serverless**: 180–320 QPS (elastic auto‑scaling) +- **Weaviate**: 300–1500+ QPS with RQ/BQ compression + rescoring + (Strong choice for mid-scale, hybrid workloads) **📈 Best Recall Accuracy** - **Qdrant**: 99.5% recall with high performance @@ -80,7 +82,7 @@ Modern applications demand systems capable of handling billions of vectors while | **Marqo** | Multi‑billion | High | Distributed (Vespa backend) | Horizontal | Managed + Self‑hosted | Eventually consistent | | **TypeSense** | Millions to billions | High | Distributed cluster | Horizontal | Managed + Self‑hosted | Eventually consistent | | **Qdrant** | Billions+ vectors | Very high | Distributed BASE model | Horizontal + Vertical | Managed + Self‑hosted | Eventually consistent | -| **Weaviate** | Billions+ vectors | High | Distributed + Sharding | Horizontal | Managed + Self‑hosted | Eventually consistent | +| **Weaviate** | Billions+ vectors | High | Raft-backed distributed cluster, per-tenant sharding, ACORN filtering, compression (BQ/RQ) | Horizontal | Managed + Self‑hosted | Eventually consistent | | **SingleStore** | Petabyte‑scale | Very high | Distributed SQL + ACID | Horizontal + Vertical | Managed + Self‑hosted | Strong consistency | ### 🏛️ Architecture Highlights @@ -122,7 +124,7 @@ Efficient indexing algorithms are fundamental to fast similarity search. The cho | **Marqo** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | Vespa-optimized HNSW | | **TypeSense** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | Standard HNSW | | **Qdrant** | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | Filterable HNSW | -| **Weaviate** | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | Custom HNSW with CRUD | +| **Weaviate** | ✅ | ❌ | RQ/BQ | ❌ | ❌ | ❌ | HNSW + RQ/BQ (full CRUD) | | **SingleStore** | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ (AUTO) | Faiss-based implementations | ### 🎯 Indexing Innovations @@ -161,7 +163,7 @@ Advanced querying capabilities determine real-world applicability. The trend tow | **Marqo** | ✅ | ✅ | Euclidean, Angular, Dot, Hamming | ✅ Query DSL | Pre-filtering | ✅ Multimodal | | **TypeSense** | ✅ | ✅ | Cosine (primary) | ✅ | Standard filtering | ✅ Rank Fusion | | **Qdrant** | ✅ | ✅ | Cosine, Euclidean, Dot Product | ✅ JSON payload | **In-flight filtering** | ✅ (External) | -| **Weaviate** | ✅ | ✅ | Cosine, Euclidean, Dot, Hamming | ✅ Property-based | Standard filtering | ✅ BM25 + Vector | +| **Weaviate** | ✅ | ✅ | Cosine, Euclidean, Dot, Hamming | ✅ Property-based | ACORN filtering (post-filter ANN) | ✅ BM25 + Vector | | **SingleStore** | ✅ | ✅ | Euclidean, Dot Product | ✅ SQL predicates | **SQL-integrated** | ✅ Re-ranking | ### 🎯 Querying Excellence @@ -202,7 +204,7 @@ Comprehensive data management capabilities are essential for production deployme | **Marqo** | ✅ Built-in inference, Marqtune for fine-tuning | ✅ Full | ✅ Good | ✅ Custom models | ⚠️ Basic | ✅ Standard | ✅ Multimodal | | **TypeSense** | ✅ Built-in + External | ✅ Full | ✅ Good | ✅ OpenAI, Google PaLM | ⚠️ Collection-based | ✅ Standard | ✅ Rich metadata | | **Qdrant** | External only | ✅ Full with real-time | ✅ Excellent | ✅ LangChain, custom | ✅ Payload-based | ✅ Enterprise-ready | ✅ JSON, geo, nested | -| **Weaviate** | ✅ Modular vectorizers | ✅ Full CRUD | ✅ GraphQL + REST | ✅ Extensive modules | ⚠️ Schema-based | ✅ Good | ✅ Rich schema | +| **Weaviate** | ✅ Modular vectorizers | ✅ Full CRUD | ✅ GraphQL + REST | ✅ Extensive modules | ✅ True multi-tenancy (1 tenant = 1 shard, lazy-loaded) | ✅ Good | ✅ Rich schema | | **SingleStore** | External only | ✅ SQL CRUD | ✅ SQL + drivers | ✅ Standard SQL tools | ✅ Database-level | ✅ Enterprise RDBMS | ✅ Full SQL types | ### 🏆 Feature Excellence @@ -246,7 +248,7 @@ Understanding financial implications requires analyzing not just subscription co | **Marqo** | Hybrid | Free (OSS) / Cloud pricing | 🔄 Mixed | 🟡 Medium | 🟢 Moderate | 💰💰 | | **TypeSense** | Hybrid | Free (OSS) / $20+ / month | 🔄 Mixed | 🟡 Medium | 🟢 Low | 💰 | | **Qdrant** | Hybrid | Free tier → $25+ / month | 🔄 Mixed | 🟡 Medium | 🟢 Moderate | 💰💰 | -| **Weaviate** | Hybrid | Free (OSS) / $25+ / month | 🔄 Mixed | 🟡 Medium | ⚠️ Scaling complexity | 💰💰 | +| **Weaviate** | Hybrid | Free (OSS) / ~$45 (Flex), ~$280 (Plus) / month | 🔄 Mixed | 🟡 Medium | ⚠️ Scaling complexity | 💰💰 | | **SingleStore** | Resource-based | Enterprise pricing | 🔄 Mixed | 🟡 Medium (SQL expertise) | 🟢 Consolidation savings | 💰💰💰 | ### 💡 Cost Strategy Recommendations @@ -255,6 +257,7 @@ Understanding financial implications requires analyzing not just subscription co - **TypeSense**: Most cost-effective overall - **Qdrant Free Tier**: Excellent performance at no cost - **OpenSearch Self-hosted**: If expertise available (steep learning curve) +- **Weaviate (Flex)**: hybrid search, predictable cost, modular vectorization **🏢 Best for Scale (100M+ vectors)** - **Self-hosted Qdrant**: Best performance per dollar @@ -267,6 +270,7 @@ Understanding financial implications requires analyzing not just subscription co - **AWS OpenSearch Serverless**: OCU-based pricing, no management overhead - **TypeSense Cloud**: Best balance of speed and affordability - **Qdrant Cloud**: Good performance with reasonable pricing +- **Weaviate Cloud (Flex/Plus)**: fastest for hybrid RAG + multi-tenant prototypes # 📚 Individual Database Deep Dives @@ -284,7 +288,7 @@ For detailed technical analysis, implementation guides, and specific use case re | **🎯 Marqo** | [Complete Analysis →](./databases/MARQO_REVIEW.md) | Multimodal capabilities, built-in ML inference, Marqtune fine-tuning | AI applications requiring image/text search | Moderate scaling costs, GPU-intensive workloads can escalate costs | | **🚀 TypeSense** | [Complete Analysis →](./databases/TYPESENSE_REVIEW.md) | Cost-effective, typo-tolerant search, easy setup | Small to medium scale with budget constraints | Best cost-performance ratio | | **🧊 Qdrant** | [Complete Analysis →](./databases/QDRANT_REVIEW.md) | High performance, Rust optimization, flexible filtering | High-throughput applications requiring speed | Excellent value at scale | -| **🧠 Weaviate** | [Complete Analysis →](./databases/WEAVIATE_REVIEW.md) | Modular vectorization, GraphQL API, extensive ML integrations | AI applications requiring flexible data schemas and ML workflows | Schema-based multi-tenancy, scaling complexity | +| **🧠 Weaviate** | [Complete Analysis →](./databases/WEAVIATE_REVIEW.md) | Modular vectorization, GraphQL API, extensive ML integrations | AI applications requiring flexible data schemas and ML workflows | Tenant-level sharding + lifecycle, scaling complexity | | **⚙️ SingleStore** | [Complete Analysis →](./databases/SINGLESTORE_REVIEW.md) | SQL integration, strong consistency (ACID), petabyte-scale, fastest indexing | Enterprise applications requiring SQL compatibility and transactional guarantees | Enterprise pricing, consolidation potential for existing SQL workloads | --- diff --git a/databases/WEAVIATE_REVIEW.md b/databases/WEAVIATE_REVIEW.md index f6c9204..4f8e8d0 100644 --- a/databases/WEAVIATE_REVIEW.md +++ b/databases/WEAVIATE_REVIEW.md @@ -2,242 +2,314 @@ ## Overview -Weaviate is an open-source vector database designed with a cloud-native architecture and GraphQL API, focusing on storing both data objects and their vector embeddings. Built to handle AI-powered applications at scale, Weaviate distinguishes itself through its modular design with pluggable vectorization modules and strong hybrid search capabilities. It offers flexible deployment options including fully-managed cloud services and self-hosted installations, making it accessible to teams with varying technical expertise and infrastructure preferences. +Weaviate is an open-source vector database designed with a cloud-native architecture and GraphQL/REST/gRPC APIs, focusing on storing both data objects and their vector embeddings. It distinguishes itself through modular, pluggable vectorization, strong hybrid (keyword + vector) search, and native multi-tenancy. It can be run as **Weaviate Cloud** (Shared/Dedicated) or self-hosted on Kubernetes/Docker, giving teams flexibility across MVPs and large-scale deployments. + +Since late 2025, Weaviate Cloud uses a **new pricing model** based on **vector dimensions, storage, and backups**, with three plans (Flex, Plus, Premium) replacing the older Standard/Professional/Business Critical tiers and AIU-based Enterprise pricing. ## Architecture ### Core Architecture -- **Collections**: Named sets of vectors sharing the same dimensionality and distance metric, supporting multi-tenancy -- **Segmented Storage**: Data divided into smaller segments with individual HNSW indexes for concurrent search and indexing -- **GraphQL API**: Primary interface for complex data queries and traversals, with REST endpoints also available -- **Modular Vectorization**: Pluggable system supporting various embedding models (OpenAI, Cohere, Hugging Face, etc.) -- **Distributed System**: Horizontal scaling through sharding and replication with Raft consensus + +- **Collections**: Named sets of data objects and their vectors sharing the same dimensionality, distance metric, and index/compression settings; collections can be single-tenant or multi-tenant. +- **Segmented Storage**: Data is partitioned into shards and segments; each shard holds one tenant in multi-tenant setups, providing isolation and efficient querying. +- **GraphQL API**: Primary interface for complex queries and traversals, with REST and gRPC available for ingestion and high-throughput use cases. +- **Modular Vectorization**: Pluggable modules and integrations for popular embedding providers (OpenAI, Cohere, etc.), plus Weaviate’s own embedding service. +- **Distributed System**: Horizontal scaling via sharding and replication, backed by Raft-based metadata consensus. ### Key Features -- Flexible vectorization with built-in module system for automatic embedding generation -- Native hybrid search combining semantic vector and keyword (BM25) search -- Multi-target vector search with various join strategies -- Real-time CRUD operations with full update support -- AutoCut feature for intelligent result limiting based on metric discontinuities + +- Flexible vectorization with built-in modules or external pipelines. +- Native hybrid search combining semantic vector search with BM25 keyword search. +- Multi-vector and multimodal support (multiple vector fields per object). +- Real-time CRUD with full update support. +- **ACORN-based filtering** as the default strategy (v1.34+), significantly improving filtered search performance on large datasets. ## Pricing Models ### Serverless Cloud Pricing -**Standard Tier:** -- Base rate: $0.095 per 1M vector dimensions stored/month -- Minimum: $25/month -- Support: Business hours email support -- Response times: 1-5 business days depending on severity - -**Professional Tier:** -- Base rate: $0.145 per 1M vector dimensions stored/month -- Minimum: $135/month -- Support: 24/7 phone escalation -- Response times: 4h-2bd depending on severity - -**Business Critical Tier:** -- Base rate: $0.175 per 1M vector dimensions stored/month -- Minimum: $450/month -- Support: 24/7 phone escalation -- Response times: 1h-1bd depending on severity + +> Former “Serverless Cloud” with Standard/Professional/Business Critical tiers is now Shared Cloud with three plans: Flex, Plus, Premium, all billed on vector dimensions + storage (GiB) + backups (GiB). (weaviate.io) +> + +**Flex (Shared Cloud, pay-as-you-go):** + +- Typical base: **≈ $45/month** (varies by region). +- Vector dimensions: **≈ $0.000327 per 1M dimensions/month** (reference rate from internal 2025 docs). +- Storage: **≈ $0.2125 per GiB/month** (persistent data). +- Backups: **≈ $0.022 per GiB/month**, daily backups with 7-day retention for new subscriptions. +- Designed for evaluation and small production workloads (99.5%–99.9% SLA depending on configuration). + +**Plus (Shared or small Dedicated):** + +- Typical base: **≈ $280/month**. +- Uses the same unit metrics (vector dimensions, storage, backups), often at similar or discounted rates relative to Flex, depending on contract. +- Daily backups stored for **30 days**. +- Higher SLA (up to 99.9%) and stronger support options. + +**Premium (Dedicated Cloud / Enterprise):** + +- Annual contracts with higher base commitments (low five figures/year and up). +- Dedicated infrastructure (single-tenant), 99.9%–99.95% SLA, private networking, and compliance options (SOC 2, HIPAA). +- Same *pricing dimensions* (vector dimensions, storage, backups) but with enterprise discounts and custom sizing. **Example Cost (1M products, 768-dim embeddings):** -- 768M dimensions = 768 × $0.095 = ~$73/month (Standard tier) -- With Professional tier: 768 × $0.145 = ~$111/month -- With Business Critical: 768 × $0.175 = ~$134/month + +Assume: + +- 1M product embeddings at 768 dimensions → **768M vector dimensions**. +- ~2.9 GiB of total data (vectors + indexes + metadata). +- Single replica (for simplicity; HA would scale dimensions and storage proportionally). + +**Flex (Shared Cloud):** + +- Vector dimensions: 768M × $0.000327 ≈ **$0.25/month** +- Storage: 2.9 GiB × $0.2125 ≈ **$0.62/month** +- Backups: 2.9 GiB × $0.022 ≈ **$0.06/month** +- Base fee: **≈ $45/month** + +**Total Flex:** ~**$46/month** + +**Plus (Shared/Dedicated):** + +- Same usage charges (~$0.25 + $0.62 + $0.06 ≈ $0.93/month) +- Base fee: **≈ $280/month** + +**Total Plus:** ~**$281/month** + +> Older pricing references like $0.095 / 1M dims with a $25/month “Standard” tier and separate HA multipliers are now deprecated and replaced by the above model. (Weaviate Community Forum) +> ### Enterprise Cloud Pricing -**AI Unit (AIU) Based Pricing:** -- Starting from $2.64 per AIU -- Flexible storage tiers: - - HOT: For frequently accessed data - - WARM: For less-frequently accessed data - - COLD: For archived data with fast activation -- Dedicated resources with customer isolation -- Annual contracts for predictable billing -- High availability options (3x multiplier) - -**Example Cost (same workload):** -- Custom pricing based on AIU consumption -- Contact sales for detailed quotes -- Significant savings possible with storage tier optimization + +Previously, Enterprise Cloud used an **AI Unit (AIU)** system with HOT/WARM/COLD storage tiers and a 3× multiplier for HA. That model has been **retired**. + +Today’s **Dedicated Cloud (Premium / some Plus contracts)**: + +- Uses the **same metrics** as Shared Cloud: vector dimensions, storage, backups. +- Adds a **contracted base commit** (e.g., starting around $10k/year on marketplaces) plus overage charges per 1M dimensions. +- Makes most sense when you consolidate multiple workloads and need SLAs, security, and isolation that Shared Cloud cannot provide, rather than for a single small index. ## Performance Characteristics -| Configuration | QPS | P99 Latency | Recall | Use Case | -|---------------|-----|-------------|---------|----------| -| WeaviateCloud-standard | 15.33 | ~250ms | 0.806 | Small-scale prototyping | -| WeaviateCloud-bus_crit | Similar | ~200ms | Similar | Production workloads | -| Self-hosted (optimized) | Higher | <200ms | 0.85+ | Large-scale deployments | -| With external vectorizers | Variable | +50-200ms | Same | Real-time embedding generation | +| Configuration | QPS (approx) | P99 Latency (typical) | Recall (typical) | Use Case | +| --- | --- | --- | --- | --- | +| Shared Cloud (Flex/Plus) | 100s–low 1000s | ~50–200ms (warm) | 0.80–0.97 | Small–mid production / RAG | +| Dedicated Cloud (Premium, tuned) | 1000s+ | ~10–100ms (warm) | 0.9–0.99+ | Latency-sensitive search at scale | +| Self-hosted (optimized, BQ/RQ) | up to ~10,000 | <100ms (warm) | 0.96–0.99+ | Large-scale custom deployments | +| With external vectorizers | Variable | +50–200ms extra | Same | Online embedding generation | + +These ranges align with Weaviate’s own ANN benchmarks (showing high QPS for tuned HNSW and compressed indexes) and third-party comparisons where Weaviate achieves several hundred to 1000+ QPS on realistic datasets when sized correctly. ## Pros ### ✅ Flexible Vectorization System -- Pluggable modules for various embedding models -- Automatic vectorization during ingestion or query -- Easy model switching without data migration + +- Pluggable modules for popular embedding providers plus a managed embedding add-on. +- Option to either vectorize at ingestion/query time or use pre-computed embeddings. +- Model changes often require only config updates, not data migrations. ### ✅ Strong Hybrid Search -- Native BM25 + vector search combination -- Sophisticated score fusion techniques -- Multi-target vector search capabilities + +- Native BM25 + vector search with score fusion strategies. +- Works well for catalog, documentation, and help-center search. +- Supports multiple vector fields per object for different modalities or models. ### ✅ Open Source Foundation -- Apache 2.0 license with no vendor lock-in -- Active community and development -- Self-hosting option for full control + +- Core engine is open source and can be self-hosted. +- Active community, documentation, and ecosystem tooling. +- Self-hosted Weaviate uses the same engine as Weaviate Cloud. ### ✅ Developer-Friendly Features -- GraphQL API for expressive queries -- Good integration with AI frameworks (LangChain, LlamaIndex) -- Comprehensive documentation and tutorials -## Cons (Areas for Awareness & Planning) +- GraphQL, REST, and gRPC APIs. +- Integrations and examples for LangChain, LlamaIndex, and other frameworks. +- Best-practice guidance for indexing, multi-tenancy, and operations. + +## Cons (Areas for Awareness & Planning) ### ⚙️ Performance Challenges at Scale -- Reports of degradation with billions of vectors -- Query latencies can reach 300-500ms under heavy load -- Indexing bottlenecks in high-concurrency scenarios + +- For **multi-billion vectors** with tight (<50ms) global SLOs, you must carefully tune sharding, compression, and hardware; naive configs can see P99 latencies in the 200–500ms range under heavy load. +- Hybrid search (BM25 + vector) and complex filters are more expensive than pure vector ANN and require ACORN and filter tuning to maintain performance. ### ⚙️ Complex Horizontal Scaling -- Manual intervention often required for scale-up -- Cannot be performed automatically per some reports -- Requires assistance from Weaviate engineers + +- Effective cluster design (shards, replicas, memory limits) still requires experience. +- **Multi-tenancy uses one shard per tenant**, which simplifies isolation but means a single “heavy” tenant is bound to one shard’s capacity; you must split that tenant at the application level if it grows too large. +- Multi-region or cross-cloud architectures typically need direct involvement from experienced operators or Weaviate’s team. ### ⚙️ Vectorizer Dependencies -- External API calls can introduce latency -- Connectivity issues with vectorizer endpoints -- Configuration complexity for optimal performance + +- In-database vectorization depends on external LLM/embedding providers; latency and rate limits can affect ingestion and query-time performance. +- For strict latency and reliability, many teams pre-compute vectors and treat Weaviate purely as a vector+metadata store. ### ⚙️ Known Issues -Recent reports indicate: -- Shard assignment problems in clusters -- Node desynchronization after OOM errors -- "Context Deadline Exceeded" errors -- Lack of usable UI for self-hosted deployments + +Recent community discussions highlight: + +- Operational complexity for self-hosted clusters (upgrades, scaling, observability). +- Challenges scaling very large tenants under the one-shard-per-tenant constraint. +- Lack of a full GUI admin console for self-hosted; monitoring relies on metrics/logs and external tooling. ## Benchmarks -Performance data from various sources: -- **Zilliz VectorDBBench**: Ranked 17th-18th in QPS (0.28-0.29 score) -- **Lyzr Migration**: Moved away due to 300-500ms latencies at scale -- **Community Reports**: Sub-200ms average latency in optimal conditions +Performance information comes from: + +- Official docs and blogs (e.g., ACORN filtering and quantization benchmarks). +- Community comparisons (e.g., Weaviate vs Qdrant vs Pinecone). +- Marketplace and independent reviews that include basic QPS/tail latency numbers. + +Key points: -**⚠️ Note**: Performance heavily depends on vectorizer configuration and cluster setup. Self-hosted deployments require significant tuning for optimal results. +- HNSW with uncompressed vectors can achieve **high QPS with sub-100ms** latency on properly sized hardware. +- Quantization (Binary and Rotational) trades a small recall loss for significant memory savings and often similar or better throughput. +- ACORN improves filtered query performance significantly on large datasets with low-correlation filters. + +Actual production performance is highly workload-dependent; custom benchmarks on representative data are still required. ## When to Choose Weaviate ### ✅ Good Fit -- Need flexible vectorization with multiple model options -- Strong GraphQL API requirements -- Hybrid search is critical to your use case -- Want open-source solution with managed option -- Building multimodal search applications + +- You need **hybrid search** (semantic + keyword) for catalogs, docs, or support portals. +- You value **open-source**, with the option to move between self-hosted and managed cloud. +- You’re building **multi-tenant SaaS** where tenant isolation, lazy loading, and lifecycle management matter. +- You want flexibility in embedding providers and may change models over time. +- You’re willing to invest some effort into schema and index tuning for performance/cost efficiency. ### ❌ Consider Alternatives -- Need sub-100ms latency at billion+ scale -- Require fully automated horizontal scaling -- Limited operational expertise for self-hosting -- Cost-sensitive at scale (serverless can be expensive) -- Need simple vector-only search without complexity + +- You need ultra-low latency (<30ms P99) at **massive global scale** and prefer a more opinionated, specialized platform. +- You want a “zero-ops” experience with minimal tuning and configuration. +- Your team lacks capacity for basic infra/observability, and you don’t want managed cloud either. +- You only need simple vector-only search without hybrid/multi-tenancy features and want the simplest possible API. ## Alternatives to Consider -- **For Performance**: Qdrant, Pinecone (specialized vector DBs) -- **For Simplicity**: Pinecone, Typesense -- **For SQL Integration**: SingleStore, PostgreSQL + pgvector -- **For Cost**: Self-hosted Qdrant, OpenSearch +- **For peak performance / simplicity**: highly opinionated managed vector stores with serverless RU-style pricing. +- **For simplicity and limited features**: lighter-weight hosted vector search services. +- **For SQL-centric stacks**: PostgreSQL + pgvector, SingleStore, and similar. +- **For minimum infra cost**: self-hosted Weaviate, Qdrant, or OpenSearch if you already have strong DevOps capacity. ## Real-World Cost Examples +> All previous cost examples using $0.095 / $0.145 / $0.175 per 1M dims and AIU multipliers are stale. The following recomputes each scenario using the 2025 Flex/Plus/Premium model (vector dimensions + storage + backups), with approximate rates from updated docs. +> + ### Example 1: E-commerce Product Search + **Scenario**: Online retailer with 1M products, daily catalog updates -- **Vectors**: 1M product embeddings (768-dim) = 768M dimensions + +- **Vectors**: 1M product embeddings (768-dim) = **768M dimensions**, ~**2.9 GiB** storage. - **Daily writes**: 50K product updates - **Monthly reads**: 2M customer searches - **Metadata**: Product attributes (category, price, etc.) -**Monthly Cost Breakdown:** +**Monthly Cost Breakdown (Shared Cloud):** + ``` -Serverless Standard: -- Storage: 768M dimensions × $0.095/M = $72.96 -- Minimum tier fee: $25 (covered by usage) -Total: ~$73/month - -Serverless Professional (for better SLA): -- Storage: 768M dimensions × $0.145/M = $111.36 -- Minimum tier fee: $135 -Total: $135/month (minimum applies) +Flex: +- Base fee: ≈ $45 +- Vector dimensions: 768M × $0.000327 ≈ $0.25 +- Storage: 2.9 GiB × $0.2125 ≈ $0.62 +- Backups: 2.9 GiB × $0.022 ≈ $0.06 +Total: ~ $46/month + +Plus: +- Base fee: ≈ $280 +- Vector dimensions: same ≈ $0.25 +- Storage: same ≈ $0.62 +- Backups: same ≈ $0.06 +Total: ~ $281/month + ``` ### Example 2: Large-Scale RAG Application + **Scenario**: Enterprise knowledge base with document chunks -- **Vectors**: 100M document chunks (1536-dim) = 153.6B dimensions -- **Daily writes**: 1M new chunks (content updates) + +- **Vectors**: 100M document chunks (1536-dim) = **153.6B dimensions** (153,600M). +- **Daily writes**: 1M new chunks - **Monthly reads**: 50M user queries -- **High metadata**: Document metadata, timestamps, permissions +- **Metadata**: Rich document metadata, timestamps, permissions +- **Storage**: ~**586 GiB** (vectors + indexes + metadata, assuming modest compression). + +**Monthly Cost Breakdown (Plus, Shared or small Dedicated):** -**Monthly Cost Breakdown:** ``` -Serverless Business Critical: -- Storage: 153,600M dimensions × $0.175/M = $26,880 -- High availability (3x): $80,640 -Total: ~$80,640/month - -Enterprise Cloud: -- Custom AIU-based pricing -- Potential savings with storage tiering -- Contact sales for quote +Plus: +- Base fee: ≈ $280 +- Vector dimensions: 153,600M × $0.000327 ≈ $50.23 +- Storage: 586 GiB × $0.2125 ≈ $124.53 +- Backups: 586 GiB × $0.022 ≈ $12.89 +Total: ~ $468/month + ``` +> For HA with multiple replicas, both dimensions and storage scale roughly linearly with the number of replicas; this would increase the usage component accordingly, but not reintroduce the old AIU “3× multiplier” semantics. +> + ### Example 3: Real-time Recommendation Engine + **Scenario**: Streaming platform with user behavior vectors -- **Vectors**: 10M user profiles + 1M content items (512-dim) = 5.6B dimensions + +- **Vectors**: 10M user profiles + 1M content items (512-dim) +- **Total**: 11M × 512 = **5.6B dimensions** (≈ 5,632M). - **High write volume**: 5M daily interactions - **Very high reads**: 100M monthly recommendations -- **Real-time requirements**: Sub-50ms latency +- **Real-time requirements**: Sub-50ms latency (for warm data paths) +- **Storage**: ~**21 GiB** combined vectors + metadata. + +**Monthly Cost Breakdown (Plus, Shared Cloud):** -**Monthly Cost Breakdown:** ``` -Serverless Professional: -- Storage: 5,600M dimensions × $0.145/M = $812 -- Performance may not meet sub-50ms requirement -Total: ~$812/month - -Enterprise Cloud recommended: -- Dedicated resources for consistent latency -- Custom pricing based on requirements +Plus: +- Base fee: ≈ $280 +- Vector dimensions: 5,632M × $0.000327 ≈ $1.84 +- Storage: 21 GiB × $0.2125 ≈ $4.46 +- Backups: 21 GiB × $0.022 ≈ $0.46 +Total: ~ $287/month + ``` +For this workload, **Dedicated Cloud (Premium)** becomes attractive mainly for: + +- Stricter global SLOs (e.g., P95 < 50ms across regions), +- Compliance and private networking, +- Or consolidating many similar workloads on dedicated infrastructure. + ### Cost Tipping Points -| Workload Type | Serverless Wins | Enterprise Wins | -|---------------|-----------------|-----------------| -| **Prototyping/Development** | <10M dimensions | N/A (use serverless) | -| **Production Search** | <1B dimensions | >1B dimensions | -| **High-Performance RAG** | <500M dimensions | >500M dimensions | -| **Multi-tenant SaaS** | <100 tenants | >100 tenants | +| Workload Type | Serverless (Flex / Plus) Wins | Enterprise / Premium Wins | +| --- | --- | --- | +| **Prototyping/Dev** | <10M vectors, relaxed SLAs, low budgets | Rarely needed | +| **Production Search** | Up to ~100M vectors in one region | Very high QPS, strict SLAs, multi-region reads | +| **High-Performance RAG** | Medium corpora with compression + tuned ACORN | Massive corpora with complex filters + compliance | +| **Multi-tenant SaaS** | Hundreds–low thousands of tenants per cluster | Tens of thousands of high-traffic tenants | -**⚠️ Critical Insight**: Serverless costs scale linearly with dimensions stored, making it expensive for large datasets. Enterprise Cloud with storage tiering can be more cost-effective at scale. +**Critical Insight**: Under the 2025 model, **vector dimensions are cheap**; storage, backups, and plan minimums dominate. Compression (RQ/BQ) and tenant lifecycle management (offloading inactive tenants) are the main levers to control cost at scale. ## Bottom Line -Weaviate excels for teams needing flexible vectorization and strong hybrid search capabilities without the burden of building embedding pipelines. The modular architecture offers compelling advantages for multimodal applications and scenarios requiring model flexibility, but **performance degradation at scale** has been a recurring issue: +Weaviate remains a strong choice for teams that: + +- Need **hybrid search**, multi-tenancy, and flexible vectorization. +- Want the option to move between self-hosted OSS and fully managed cloud. +- Are willing to design schemas, indexes, and multi-tenant layouts with some care. -- **Sweet spot**: 100M-1B vectors with moderate query loads and hybrid search requirements -- **Danger zone**: Multi-billion vectors with sub-100ms latency requirements -- **Break-even**: ~1B dimensions where Enterprise Cloud becomes more cost-effective than Serverless +The **old** Weaviate pricing story—“$0.095 per 1M dims, AIUs for Enterprise, and HA as a simple 3× multiplier”—is now outdated. The **current** model is: -For production deployments, carefully evaluate: +- **Conceptually simple** (vector dimensions + storage + backups), +- **Operationally realistic** (HA, backups, and retention are explicit), +- And **better aligned** with how engineers actually tune cost (compression, data tiering, tenant lifecycle). -1. **Vectorizer overhead** vs. pre-computed embeddings -2. **Operational complexity** at your expected scale -3. **Total cost** including potential performance issues -4. **Hybrid search importance** for your use case +**Recommendation**: -**Recommendation**: Start with Serverless for prototyping and small-scale production. **Consider Enterprise Cloud or alternatives** for large-scale deployments requiring consistent sub-100ms performance. +- Start on **Flex** or self-hosted OSS for experiments and smaller RAG/search workloads. +- Move to **Plus** once you have stable production traffic and need stronger SLAs. +- Consider **Premium Dedicated Cloud** when you require strict latency/compliance guarantees or are consolidating many production tenants into a single enterprise platform. --- -*Last updated: January 2025 | Based on official documentation, benchmark studies, and user reports* \ No newline at end of file +*Last updated: November 2025 | Reflects Weaviate’s October 2025 pricing update and current multi-tenancy and performance guidance.* \ No newline at end of file