Skip to content

Commit 13c2700

Browse files
author
peruna
committed
Harden search egress auditing and disable telemetry defaults
1 parent 278aa24 commit 13c2700

File tree

7 files changed

+69
-13
lines changed

7 files changed

+69
-13
lines changed

README.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,8 @@ const test=(h)=>new Promise(r=>{\
8484

8585
Edit the allowlist and restart the proxy to change which domains are permitted.
8686

87+
When the local-search overlay is enabled, SearXNG/Firecrawl egress is intentionally routed through Squid with full access logging so outbound search fetches are auditable.
88+
8789
---
8890

8991
## Model Presets
@@ -184,7 +186,7 @@ Search context is bounded by default (3 results, 2 scraped sources, 2 highlights
184186

185187
- SearXNG config (`optional/local-search/searxng/settings.yml`): JSON output enabled (LibreChat requires it), noisy engines removed, timeout lowered to 4 s.
186188
- Rate limiting uses Valkey with a private-IP allowlist so LibreChat doesn't trip bot detection.
187-
- Firecrawl, its Redis, RabbitMQ, and Postgres sit on a dedicated `search` network. Only SearXNG, the Firecrawl API, and Playwright get WAN access (they need it to fetch pages).
189+
- Firecrawl, its Redis, RabbitMQ, and Postgres sit on a dedicated `search` network. SearXNG + Firecrawl web-fetch traffic is routed through Squid on a dedicated `search_egress` subnet so requests are auditable in proxy logs.
188190
- The Jina compatibility patch makes `batch_size` optional — LibreChat's client omits it.
189191
- A mounted search patch caps scraped text, requests `markdown` + `onlyMainContent`, and strips raw `content` from the artifact returned to the model.
190192
- After editing files under `optional/local-search/jina/`, recreate the container to pick up changes.
@@ -245,6 +247,9 @@ MEILI_LOG_LEVEL=WARN
245247
CODE_INTERPRETER_LOG_LEVEL=WARNING
246248
FIRECRAWL_LOG_LEVEL=warn
247249
JINA_RERANKER_LOG_LEVEL=WARNING
250+
DO_NOT_TRACK=1
251+
FIRECRAWL_NO_TELEMETRY=1
252+
HF_HUB_DISABLE_TELEMETRY=1
248253
```
249254

250255
**Useful log commands:**
@@ -253,7 +258,7 @@ docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
253258
docker logs -f --tail=200 LibreChat
254259
docker logs -f --tail=200 egress-proxy
255260

256-
# Squid logs denied destinations by default (allowed CONNECTs are suppressed).
261+
# Squid logs all local-search egress (auditable) plus denied requests elsewhere.
257262
# Look for TCP_DENIED/403 when debugging allowlist misses.
258263
docker exec -u proxy egress-proxy sh -lc 'tail -f /var/log/squid/access.log'
259264
```

docker-compose.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,7 @@ services:
124124
http_proxy: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
125125
https_proxy: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
126126
NODE_OPTIONS: --require /app/proxy-bootstrap.cjs
127+
DO_NOT_TRACK: ${DO_NOT_TRACK:-1}
127128
NO_PROXY: ${NO_PROXY:-localhost,127.0.0.1,::1,mongodb,chat-mongodb,meilisearch,vectordb,rag_api,sandpack,sandpack-static,caddy-static-proxy,api-proxy,egress-proxy}
128129
no_proxy: ${NO_PROXY:-localhost,127.0.0.1,::1,mongodb,chat-mongodb,meilisearch,vectordb,rag_api,sandpack,sandpack-static,caddy-static-proxy,api-proxy,egress-proxy}
129130
CONFIG_PATH: /run/secrets/librechat_yaml
@@ -289,6 +290,8 @@ services:
289290
RAG_HOST: 0.0.0.0
290291
DB_HOST: vectordb
291292
RAG_PORT: ${RAG_PORT:-8000}
293+
DO_NOT_TRACK: ${DO_NOT_TRACK:-1}
294+
HF_HUB_DISABLE_TELEMETRY: ${HF_HUB_DISABLE_TELEMETRY:-1}
292295
EGRESS_PROXY_URL: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
293296
HTTP_PROXY: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
294297
HTTPS_PROXY: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}

optional/code-interpreter/compose.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,8 @@ services:
4343
API_KEY: ${LIBRECHAT_CODE_API_KEY:-local-code-interpreter-key-change-me}
4444
REDIS_HOST: code-interpreter-redis
4545
REDIS_PORT: 6379
46+
DO_NOT_TRACK: ${DO_NOT_TRACK:-1}
47+
HF_HUB_DISABLE_TELEMETRY: ${HF_HUB_DISABLE_TELEMETRY:-1}
4648
MINIO_ENDPOINT: code-interpreter-minio:9000
4749
MINIO_ACCESS_KEY: ${CODE_INTERPRETER_MINIO_ACCESS_KEY:-minioadmin}
4850
MINIO_SECRET_KEY: ${CODE_INTERPRETER_MINIO_SECRET_KEY:-minioadmin}

optional/egress-proxy/squid.conf

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ acl allowed_domains dstdomain "/run/secrets/squid_allowlist"
2020
# Vertex AI regional endpoints follow LOCATION-aiplatform.googleapis.com.
2121
# CONNECT checks may include ":443", so allow optional port suffix.
2222
acl allowed_vertex_regional dstdom_regex -i ^[a-z0-9-]+-aiplatform\.googleapis\.com(:[0-9]+)?$
23+
# Local-search overlay egress network (optional/local-search/compose.yml).
24+
# These clients are allowed outbound web access, but all requests are audited.
25+
acl search_clients src 172.29.240.0/24
2326

2427
# Restrict ports/methods.
2528
acl SSL_ports port 443
@@ -31,11 +34,13 @@ http_access deny !Safe_ports
3134
http_access deny CONNECT !SSL_ports
3235

3336
# Allow only explicit destinations.
37+
http_access allow search_clients
3438
http_access allow allowed_vertex_regional
3539
http_access allow allowed_domains
3640
http_access deny all
3741

38-
# Keep runtime noise low: only log denied destinations.
39-
access_log stdio:/var/log/squid/access.log !allowed_vertex_regional !allowed_domains
42+
# Audit all local-search proxy traffic, plus denied traffic from other clients.
43+
access_log stdio:/var/log/squid/access.log search_clients
44+
access_log stdio:/var/log/squid/access.log !search_clients !allowed_vertex_regional !allowed_domains
4045
cache_log /dev/null
4146
cache_store_log none

optional/local-search/compose.yml

Lines changed: 40 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,11 @@ volumes:
88
networks:
99
search:
1010
internal: true
11+
search_egress:
12+
internal: true
13+
ipam:
14+
config:
15+
- subnet: 172.29.240.0/24
1116

1217
x-json-logging: &json_logging
1318
driver: json-file
@@ -16,6 +21,12 @@ x-json-logging: &json_logging
1621
max-file: "${DOCKER_LOG_MAX_FILE:-3}"
1722

1823
services:
24+
egress-proxy:
25+
networks:
26+
- lan
27+
- wan
28+
- search_egress
29+
1930
api:
2031
depends_on:
2132
searxng:
@@ -51,13 +62,19 @@ services:
5162
SEARXNG_BASE_URL: ${SEARXNG_BASE_URL:-http://searxng:8080/}
5263
SEARXNG_LIMITER: "true"
5364
SEARXNG_SECRET: ${SEARXNG_SECRET:-change-me-search-secret}
65+
HTTP_PROXY: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
66+
HTTPS_PROXY: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
67+
http_proxy: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
68+
https_proxy: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
69+
NO_PROXY: ${NO_PROXY:-localhost,127.0.0.1,::1,mongodb,chat-mongodb,meilisearch,vectordb,rag_api,sandpack,sandpack-static,caddy-static-proxy,api-proxy,egress-proxy},searxng,searxng-valkey
70+
no_proxy: ${NO_PROXY:-localhost,127.0.0.1,::1,mongodb,chat-mongodb,meilisearch,vectordb,rag_api,sandpack,sandpack-static,caddy-static-proxy,api-proxy,egress-proxy},searxng,searxng-valkey
5471
volumes:
5572
- ./optional/local-search/searxng/settings.yml:/etc/searxng/settings.yml:ro
5673
- ./optional/local-search/searxng/limiter.toml:/etc/searxng/limiter.toml:ro
5774
- searxng_data:/var/cache/searxng
5875
networks:
5976
- search
60-
- wan
77+
- search_egress
6178
ports:
6279
- "127.0.0.1:${SEARXNG_PORT:-8080}:8080"
6380
depends_on:
@@ -91,8 +108,14 @@ services:
91108
PORT: 3002
92109
ENV: local
93110
LOGGING_LEVEL: ${FIRECRAWL_LOG_LEVEL:-warn}
94-
FIRECRAWL_NO_TELEMETRY: 1
95-
DO_NOT_TRACK: 1
111+
FIRECRAWL_NO_TELEMETRY: ${FIRECRAWL_NO_TELEMETRY:-1}
112+
DO_NOT_TRACK: ${DO_NOT_TRACK:-1}
113+
HTTP_PROXY: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
114+
HTTPS_PROXY: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
115+
http_proxy: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
116+
https_proxy: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
117+
NO_PROXY: ${NO_PROXY:-localhost,127.0.0.1,::1,mongodb,chat-mongodb,meilisearch,vectordb,rag_api,sandpack,sandpack-static,caddy-static-proxy,api-proxy,egress-proxy},firecrawl-redis,firecrawl-rabbitmq,firecrawl-postgres,firecrawl-playwright,searxng,searxng-valkey
118+
no_proxy: ${NO_PROXY:-localhost,127.0.0.1,::1,mongodb,chat-mongodb,meilisearch,vectordb,rag_api,sandpack,sandpack-static,caddy-static-proxy,api-proxy,egress-proxy},firecrawl-redis,firecrawl-rabbitmq,firecrawl-postgres,firecrawl-playwright,searxng,searxng-valkey
96119
REDIS_URL: redis://firecrawl-redis:6379
97120
REDIS_RATE_LIMIT_URL: redis://firecrawl-redis:6379
98121
NUQ_RABBITMQ_URL: amqp://firecrawl-rabbitmq:5672
@@ -110,7 +133,7 @@ services:
110133
EXTRACT_WORKER_PORT: 3004
111134
WORKER_PORT: 3005
112135
USE_DB_AUTHENTICATION: "false"
113-
PROXY_SERVER: ""
136+
PROXY_SERVER: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
114137
PROXY_USERNAME: ""
115138
PROXY_PASSWORD: ""
116139
OPENAI_API_KEY: ""
@@ -138,7 +161,7 @@ services:
138161
condition: service_healthy
139162
networks:
140163
- search
141-
- wan
164+
- search_egress
142165
healthcheck:
143166
test:
144167
[
@@ -158,15 +181,21 @@ services:
158181
environment:
159182
PORT: 3000
160183
MAX_CONCURRENT_PAGES: ${FIRECRAWL_MAX_CONCURRENT_PAGES:-4}
161-
FIRECRAWL_NO_TELEMETRY: 1
162-
DO_NOT_TRACK: 1
184+
FIRECRAWL_NO_TELEMETRY: ${FIRECRAWL_NO_TELEMETRY:-1}
185+
DO_NOT_TRACK: ${DO_NOT_TRACK:-1}
186+
HTTP_PROXY: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
187+
HTTPS_PROXY: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
188+
http_proxy: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
189+
https_proxy: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
190+
NO_PROXY: ${NO_PROXY:-localhost,127.0.0.1,::1,mongodb,chat-mongodb,meilisearch,vectordb,rag_api,sandpack,sandpack-static,caddy-static-proxy,api-proxy,egress-proxy},firecrawl-api,firecrawl-redis,firecrawl-rabbitmq,firecrawl-postgres,searxng,searxng-valkey
191+
no_proxy: ${NO_PROXY:-localhost,127.0.0.1,::1,mongodb,chat-mongodb,meilisearch,vectordb,rag_api,sandpack,sandpack-static,caddy-static-proxy,api-proxy,egress-proxy},firecrawl-api,firecrawl-redis,firecrawl-rabbitmq,firecrawl-postgres,searxng,searxng-valkey
163192
BLOCK_MEDIA: ${FIRECRAWL_BLOCK_MEDIA:-}
164-
PROXY_SERVER: ""
193+
PROXY_SERVER: ${EGRESS_PROXY_URL:-http://egress-proxy:3128}
165194
PROXY_USERNAME: ""
166195
PROXY_PASSWORD: ""
167196
networks:
168197
- search
169-
- wan
198+
- search_egress
170199
healthcheck:
171200
test:
172201
[
@@ -239,6 +268,8 @@ services:
239268
MODEL_NAME: ${JINA_RERANKER_MODEL_NAME:-jinaai/jina-reranker-v1-tiny-en}
240269
CACHE_DIR: /app/.cache
241270
TOKENIZERS_PARALLELISM: "false"
271+
DO_NOT_TRACK: ${DO_NOT_TRACK:-1}
272+
HF_HUB_DISABLE_TELEMETRY: ${HF_HUB_DISABLE_TELEMETRY:-1}
242273
OMP_NUM_THREADS: ${JINA_RERANKER_OMP_NUM_THREADS:-1}
243274
JINA_RERANKER_MAX_BATCH_SIZE: ${JINA_RERANKER_MAX_BATCH_SIZE:-2}
244275
JINA_RERANKER_LOG_LEVEL: ${JINA_RERANKER_LOG_LEVEL:-WARNING}

optional/local-search/searxng/settings.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ use_default_settings:
88
general:
99
instance_name: "LibreChat Local Search"
1010
debug: false
11+
enable_metrics: false
12+
open_metrics: ""
1113

1214
search:
1315
safe_search: 1
@@ -17,6 +19,9 @@ search:
1719

1820
outgoing:
1921
request_timeout: 4.0
22+
proxies:
23+
all://:
24+
- http://egress-proxy:3128
2025

2126
server:
2227
base_url: http://searxng:8080/

template_dot_env

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,11 @@ DOCKER_LOG_MAX_FILE=3
138138
# Internal hosts that should bypass proxying.
139139
NO_PROXY=localhost,127.0.0.1,::1,mongodb,chat-mongodb,meilisearch,vectordb,rag_api,sandpack,sandpack-static,caddy-static-proxy,api-proxy,egress-proxy
140140

141+
# Disable telemetry where supported.
142+
DO_NOT_TRACK=1
143+
FIRECRAWL_NO_TELEMETRY=1
144+
HF_HUB_DISABLE_TELEMETRY=1
145+
141146
# ------------------------------------------------------------------------------
142147
# Optional: LibreCodeInterpreter (custom execute_code backend)
143148
# Enabled by adding -f optional/code-interpreter/compose.yml or

0 commit comments

Comments
 (0)