@@ -22,13 +22,53 @@ We have a number of precomputed data sets. All data sets have been pre-split int
2222
2323| Dataset | Dimensions | Train size | Test size | Neighbors | Distance |
2424| ----------------------------------------------------------------------------------------------------------- | ---------: | ---------: | --------: | --------: | --------- |
25- | [ LAION-1M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 1,000,000 | 10,000 | 100 | Angular |
26- | [ LAION-10M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 10,000,000 | 10,000 | 100 | Angular |
27- | [ LAION-20M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 20,000,000 | 10,000 | 100 | Angular |
28- | [ LAION-40M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 40,000,000 | 10,000 | 100 | Angular |
29- | [ LAION-100M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 100,000,000 | 10,000 | 100 | Angular |
30- | [ LAION-200M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 200,000,000 | 10,000 | 100 | Angular |
31- | [ LAION-400M: from LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 400,000,000 | 10,000 | 100 | Angular |
25+ | ** LAION Image Embeddings (512D)** | | | | | |
26+ | [ LAION-1M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 1,000,000 | 10,000 | 100 | Cosine |
27+ | [ LAION-10M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 10,000,000 | 10,000 | 100 | Cosine |
28+ | [ LAION-20M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 20,000,000 | 10,000 | 100 | Cosine |
29+ | [ LAION-40M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 40,000,000 | 10,000 | 100 | Cosine |
30+ | [ LAION-100M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 100,000,000 | 10,000 | 100 | Cosine |
31+ | [ LAION-200M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 200,000,000 | 10,000 | 100 | Cosine |
32+ | [ LAION-400M: from LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 400,000,000 | 10,000 | 100 | Cosine |
33+ | ** LAION Image Embeddings (768D)** | | | | | |
34+ | [ LAION-1M: 768D image embeddings] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 768 | 1,000,000 | 10,000 | 100 | Cosine |
35+ | [ LAION-1B: 768D image embeddings] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 768 | 1,000,000,000| 10,000 | 100 | Cosine |
36+ | ** Standard Benchmarks** | | | | | |
37+ | [ GloVe-25: Word vectors] ( http://ann-benchmarks.com ) | 25 | 1,183,514 | 10,000 | 100 | Cosine |
38+ | [ GloVe-100: Word vectors] ( http://ann-benchmarks.com ) | 100 | 1,183,514 | 10,000 | 100 | Cosine |
39+ | [ Deep Image-96: CNN image features] ( http://ann-benchmarks.com ) | 96 | 9,990,000 | 10,000 | 100 | Cosine |
40+ | [ GIST-960: Image descriptors] ( http://ann-benchmarks.com ) | 960 | 1,000,000 | 1,000 | 100 | L2 |
41+ | ** Text and Knowledge Embeddings** | | | | | |
42+ | [ DBpedia OpenAI-1M: Knowledge embeddings] ( https://www.dbpedia.org/ ) | 1,536 | 1,000,000 | 10,000 | 100 | Cosine |
43+ | [ LAION Small CLIP: Small CLIP embeddings] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 100,000 | 1,000 | 100 | Cosine |
44+ | ** Yandex Datasets** | | | | | |
45+ | [ Yandex T2I: Text-to-image embeddings] ( https://research.yandex.com/ ) | 200 | 1,000,000 | 100,000 | 100 | Dot |
46+ | ** Random and Synthetic** | | | | | |
47+ | Random-100: Small synthetic dataset | 100 | 100 | 9 | 9 | Cosine |
48+ | Random-100-Euclidean: Small synthetic dataset | 100 | 100 | 9 | 9 | L2 |
49+ | ** Filtered Search Datasets** | | | | | |
50+ | H&M-2048: Fashion product embeddings (with filters) | 2,048 | 105,542 | 2,000 | 100 | Cosine |
51+ | H&M-2048: Fashion product embeddings (no filters) | 2,048 | 105,542 | 2,000 | 100 | Cosine |
52+ | ArXiv-384: Academic paper embeddings (with filters) | 384 | 2,205,995 | 10,000 | 100 | Cosine |
53+ | ArXiv-384: Academic paper embeddings (no filters) | 384 | 2,205,995 | 10,000 | 100 | Cosine |
54+ | Random Match Keyword-100: Synthetic keyword matching (with filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
55+ | Random Match Keyword-100: Synthetic keyword matching (no filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
56+ | Random Match Int-100: Synthetic integer matching (with filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
57+ | Random Match Int-100: Synthetic integer matching (no filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
58+ | Random Range-100: Synthetic range queries (with filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
59+ | Random Range-100: Synthetic range queries (no filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
60+ | Random Geo Radius-100: Synthetic geo queries (with filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
61+ | Random Geo Radius-100: Synthetic geo queries (no filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
62+ | Random Match Keyword-2048: Large synthetic keyword matching (with filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
63+ | Random Match Keyword-2048: Large synthetic keyword matching (no filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
64+ | Random Match Int-2048: Large synthetic integer matching (with filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
65+ | Random Match Int-2048: Large synthetic integer matching (no filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
66+ | Random Range-2048: Large synthetic range queries (with filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
67+ | Random Range-2048: Large synthetic range queries (no filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
68+ | Random Geo Radius-2048: Large synthetic geo queries (with filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
69+ | Random Geo Radius-2048: Large synthetic geo queries (no filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
70+ | Random Match Keyword Small Vocab-256: Small vocabulary keyword matching (with filters) | 256 | 1,000,000 | 10,000 | 100 | Cosine |
71+ | Random Match Keyword Small Vocab-256: Small vocabulary keyword matching (no filters) | 256 | 1,000,000 | 10,000 | 100 | Cosine |
3272
3373
3474## 🐳 Docker Usage
@@ -39,41 +79,43 @@ The easiest way to run vector-db-benchmark is using Docker. We provide pre-built
3979
4080``` bash
4181# Pull the latest image
42- docker pull redis-performance /vector-db-benchmark:latest
82+ docker pull filipe958 /vector-db-benchmark:latest
4383
4484# Run with help
45- docker run --rm redis-performance /vector-db-benchmark:latest run.py --help
85+ docker run --rm filipe958 /vector-db-benchmark:latest run.py --help
4686
4787# Basic Redis benchmark with local Redis
48- docker run --rm --network=host redis-performance /vector-db-benchmark:latest \
49- run.py --host localhost --engines redis --dataset random-100 --experiment redis-m-16-ef-64
88+ docker run --rm --network=host filipe958 /vector-db-benchmark:latest \
89+ run.py --host localhost --engines redis --dataset random-100 --experiment redis-default-simple
5090
5191# With results output (mount current directory)
5292docker run --rm -v $( pwd) /results:/app/results --network=host \
53- redis-performance /vector-db-benchmark:latest \
54- run.py --host localhost --engines redis --dataset random-100 --experiment redis-m-16-ef-64
93+ filipe958 /vector-db-benchmark:latest \
94+ run.py --host localhost --engines redis --dataset random-100 --experiment redis-default-simple
5595```
5696
57- ### Using Docker Compose
97+ ### Using with Redis
5898
59- For a complete setup with Redis included :
99+ For testing with Redis, start a Redis container first :
60100
61101``` bash
62- # Start Redis
63- docker-compose up redis
102+ # Start Redis container
103+ docker run -d --name redis-test -p 6379:6379 redis:8.2-rc1-bookworm
64104
65105# Run benchmark against Redis
66- docker-compose run --rm vector-db-benchmark run.py --host redis --engines redis --dataset random-100 --experiment redis-m-16-ef-64
106+ docker run --rm --network=host filipe958/vector-db-benchmark:latest \
107+ run.py --host localhost --engines redis --dataset random-100 --experiment redis-default-simple
67108
68109# Or use the convenience script
69- ./docker-run.sh -H redis -e redis -d random-100 -x redis-m-16-ef-64
110+ ./docker-run.sh -H localhost -e redis -d random-100 -x redis-default-simple
111+
112+ # Clean up Redis container when done
113+ docker stop redis-test && docker rm redis-test
70114```
71115
72116### Available Docker Images
73117
74- - ** Latest** : ` redis-performance/vector-db-benchmark:latest `
75- - ** Specific versions** : ` redis-performance/vector-db-benchmark:v1.0.0 `
76- - ** Development builds** : ` redis-performance/vector-db-benchmark:update-redisearch-{sha} `
118+ - ** Latest** : ` filipe958/vector-db-benchmark:latest `
77119
78120For detailed Docker setup and publishing information, see [ DOCKER_SETUP.md] ( DOCKER_SETUP.md ) .
79121
0 commit comments