feat: Support experimental `querier` component #267

kavirajk · 2024-10-15T15:20:22Z

This shouldn't impact any of the existing uses cases of helm chart.
You can enable querier (Deployment) by have overrides values.experimental.querier.enabled flag.
You can find example overrides here.

Querier takes mainly three dependencies

Addrs to talk to core weaviate (for schema info v1/schema)
Object storage (e.g: minio, s3 or gcs address)
Vectorizer address (to vectorize query string for example)

Changes

Querier deployment
Querier service that exposes it's gPRC and http (currently only metrics server) servers.

set `experimental.querier.enabled=true` Signed-off-by: Kaviraj <[email protected]>

Signed-off-by: Kaviraj <[email protected]>

jfrancoa · 2024-10-16T05:23:58Z

weaviate/templates/querierDeployment.yaml

+kind: Deployment
+metadata:
+  name: querier
+  labels:


I guess that this is the first of many new "Weaviate components", have you though on adding a label to each of these components that will allow performing actions on all of them at once? Let's say, if we will have the querier, the batcher, the schema_handler (all invented :-D), if you want to monitor or perform some action on them at once (they are independent deployments) you can leverage labels by assigning a common label to the three deployments, for example "weaviate: serverless" and print all those pods with:

kubectl get pods -l weaviate=serverless

Of course ,this is something you can add later, it's just a suggestion, not anything blocking for the review.

That's very good idea. using labels to group these new components. I'm happy to add later as we add more components. Just don't want to add anything that's is not absolute necessary for now. Just to keep things simple (plus not a big fan of the name serverless tbh :P and things can change in the future ).

jfrancoa · 2024-10-16T05:30:29Z

weaviate/templates/querierDeployment.yaml

+    app: querier
+    app.kubernetes.io/name: querier
+spec:
+  replicas: {{ .Values.replicas }}


Will always the number of replicas for the querier be associated to the number of total replicas? I mean, couldn't it happen that there will be more queriers than any other subcomponent? If that's the case, it might be interesting allowing overriding with a :
{{.Values.experimental.querier.replicas}}

Absolutely. That's copy-paste error. We should have separate replicas for querier. Will add it.

jfrancoa · 2024-10-16T05:33:02Z

weaviate/templates/querierDeployment.yaml

+                name: {{ $secret }}
+                key: {{ $key }}
+          {{- end }}
+          {{- end }}


I'm unable to match this end to any other opening statement, is that correct?

{{- if or $.Values.experimental.querier.env $.Values.experimental.querier.envSecrets }} {{- range $key, $value := $.Values.env }} - name: {{ $key }} value: {{ $value | quote }} {{- end }} {{- range $key, $secret := $.Values.experimental.querier.envSecrets }} - name: {{ $key }} valueFrom: secretKeyRef: name: {{ $secret }} key: {{ $key }} {{- end }} {{- end }}

two end for two range and one end for outer most if. I copy pasted this from existing statefulset btw :)

jfrancoa · 2024-10-16T05:35:05Z

weaviate/templates/querierDeployment.yaml

+        {{- end }}
+        resources:
+{{ toYaml .Values.resources | indent 10 }}
+        env:


@antas-marcin isn't there any way to isolate such a long env section? maybe moving it to _helpers? we do that for Weaviate's sts, right?

I copy pasted this from Weaviate sts. Don't see any helpers there as well.

But this should be actually .Values.experimental.querier.resources instead. I will update it.

@jfrancoa yeah, good idea, since now we will have 2 weaviate deployments we can think of unifying those settings. I will add this task to our board @jfrancoa

jfrancoa · 2024-10-16T05:39:10Z

weaviate/templates/querierDeployment.yaml

+  replicas: {{ .Values.replicas }}
+  updateStrategy:
+{{ toYaml .Values.updateStrategy | indent 4}}
+  serviceName: {{ .Values.experimental.querier.service.name }}-headless


I asked GPT:

In Kubernetes, the serviceName field is not typically used directly in a Deployment object. Instead, it is more commonly found in the context of StatefulSets or Headless Services and some other resources. However, here are some scenarios where serviceName is relevant: 1. StatefulSet In the context of a StatefulSet, the serviceName field is used to associate a Headless Service with the StatefulSet. This ensures that the StatefulSet pods are discoverable by a stable DNS name based on their ordinal index (e.g., pod-0, pod-1, etc.). Each pod gets a unique DNS entry such as <pod-name>.<serviceName>, which is useful for stateful applications like databases.

Was it added intentionally or was it just a copy-paste from the Weaviate's sts?

Yes you are right. It's copy pasted. I don't think we need this. Will remove.

jfrancoa · 2024-10-16T08:34:46Z

weaviate/values.yaml

+    storage_size: 10Gi
+    pullPolicy: IfNotPresent
+    command: ["/bin/weaviate"]
+    args:


What happens if we don't pass these args? I mean, could we provide some default values so that the querier would work out of the box? Mostly to improve the user experience and avoid requesting the user who deploys to know about the weaviate's architecture internals. I know this is a first PR to test the new querier, but for example, the contextionary endpoint should be added conditionally if contextionary is configured only. Otherwise it will fail when it won't be enabled as a module.
Also, the monitoring seems a bit redundant, don't we have an env var when we want monitoring to be working (PROMETHEUS_BLABLA_ENABLED)? we should enable this only if that env var is enabled, imho

could we provide some default values so that the querier would work out of the box?

Absolutely we should. I thought I had some sane default values. But that doesn't work well with our helm chart values. I will update it 👍

I know this is a first PR to test the new querier, but for example, the contextionary endpoint should be added conditionally if contextionary is configured only. Otherwise it will fail when it won't be enabled as a module.

Currently querier has hardcoded dependency with contextionary unfortunately. We will remove it in the future and I will update the conditional loading of modules. Very good catch.

Also, the monitoring seems a bit redundant, don't we have an env var when we want monitoring to be working (PROMETHEUS_BLABLA_ENABLED)? we should enable this only if that env var is enabled, imho

Good point. Currently querier doesn't respect the ENV PROMETHEUS_MONITORING_ENABLED. I think we should (I can take it up in the follow up PR if that's ok?).

Having said that I have some opinions about configs :) I like to keep this cli-flags as well. IMHO cli-flags should take highest priority than ENVs (and every config should be able to pass via cli-flags in addition to ENVs). Not a big fan of ENV based configs tbh, in my experience it's hard to manage at scale (10s is fine, 100s is crazy). Prefer cli or file based config instead. Main rationale is I want my configs to be "local" and "explicit". ENVs are by design "global" and "implicit" (any other process that share same pod, shell can set different values).

Absolutely we should. I thought I had some sane default values. But that doesn't work well with our helm chart values. I will update it 👍

I think we still cannot provide sane default for some flags like schema_url, contextionary_url and minio_url. Because this is totally different when running in k8s, all those will be local service name.

I removed some args to make it work with defaults.

This PR also changes port mapping 9091 (used by prometheus server in our setup) to 7071
weaviate/weaviate#6043

weaviate-git-bot · 2024-10-16T10:00:23Z

To avoid any confusion in the future about your contribution to Weaviate, we work with a Contributor License Agreement. If you agree, you can simply add a comment to this PR that you agree with the CLA so that we can merge.

beep boop - the Weaviate bot 👋🤖

PS:
Are you already a member of the Weaviate Slack channel?

Signed-off-by: Kaviraj <[email protected]>

antas-marcin · 2024-10-17T12:35:55Z

weaviate/templates/querierDeployment.yaml

+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: querier


I would prefix all of our services with weaviate so the name that I suggest here would be then weaviate-querier and all of the labels also should have that prefix to easily find "our" services / deployments

antas-marcin · 2024-10-17T13:26:54Z

weaviate/templates/querierService.yaml

+metadata:
+  name: {{ .Values.experimental.querier.service.name }}
+  labels:
+    app.kubernetes.io/name: weaviate


can we agree on weaviate-querier naming? to be used throughout all of those files?

antas-marcin · 2024-10-17T13:29:07Z

weaviate/templates/querierDeployment.yaml

+            {{- end }}
+          {{- end  }}
+          - name: CLUSTER_JOIN
+            value: {{ .Values.service.name }}-headless.{{ .Release.Namespace }}.svc.{{ .Values.clusterDomain }}


shouldn't we use a separate headless service for weaviate-querier deployment? how about scaling this service? and the raft response from this service? should querier be aware of all Weaviate pods?

antas-marcin · 2024-10-17T13:29:46Z

weaviate/templates/querierDeployment.yaml

@@ -0,0 +1,456 @@
+{{ if .Values.experimental.querier.enabled }}


@kavirajk can we add some unit tests? for you deployment?

antas-marcin · 2024-10-17T13:33:53Z

weaviate/values.yaml

+  querier:
+    enabled: false
+    replicas: 3
+    image: semitechnologies/weaviate-experimental:preview-chore-change-default-configs-on-querier-23d39eb


can we follow the same pattern for defining the Weaviate image as we do it in the Weaviate STS:

Signed-off-by: Kaviraj <[email protected]>

kavirajk added 19 commits October 14, 2024 11:23

feat(querier): Support running querier component behind feature flag

cc20626

set `experimental.querier.enabled=true` Signed-off-by: Kaviraj <[email protected]>

fix some disk and volume claims

39644cb

Signed-off-by: Kaviraj <[email protected]>

fixing some template parser error

20bc41e

Signed-off-by: Kaviraj <[email protected]>

remove querier-data

aacc05b

Signed-off-by: Kaviraj <[email protected]>

replace right image tag

6b1511f

Signed-off-by: Kaviraj <[email protected]>

tweak image tag

290b9e8

Signed-off-by: Kaviraj <[email protected]>

remove the heath checks for querier

6f5c09a

Signed-off-by: Kaviraj <[email protected]>

remove the template properly

357694f

Signed-off-by: Kaviraj <[email protected]>

set the right configs for schema addr and minio dependencies

2284d7c

Signed-off-by: Kaviraj <[email protected]>

fix some envs and args

46e2721

Signed-off-by: Kaviraj <[email protected]>

Add metrics configs by default

675b55d

Signed-off-by: Kaviraj <[email protected]>

fix bool argument

2bbcae7

Signed-off-by: Kaviraj <[email protected]>

map different ports to work with core weaviate helm chart

fdf50f7

Signed-off-by: Kaviraj <[email protected]>

remap ports on service

b995741

Signed-off-by: Kaviraj <[email protected]>

fix targetPort for querier service

4187d98

Signed-off-by: Kaviraj <[email protected]>

hack: contextionary module enable and make it default

c6ddd85

Signed-off-by: Kaviraj <[email protected]>

hack: using main tag for core image

e9359c5

Signed-off-by: Kaviraj <[email protected]>

svc name resolution fix

decfeea

Signed-off-by: Kaviraj <[email protected]>

fix contextionary URL

ea878c2

Signed-off-by: Kaviraj <[email protected]>

kavirajk mentioned this pull request Oct 15, 2024

feat: Support running querier weaviate/weaviate-local-k8s#16

Open

kavirajk added 4 commits October 15, 2024 17:29

revert back original values.yaml

a39fb4b

Signed-off-by: Kaviraj <[email protected]>

change image pull policy

709cc7b

Signed-off-by: Kaviraj <[email protected]>

fix image pull policy

6ddde70

Signed-off-by: Kaviraj <[email protected]>

conditional enable service and deployment of querier

5264e4f

Signed-off-by: Kaviraj <[email protected]>

kavirajk marked this pull request as ready for review October 15, 2024 16:01

kavirajk requested review from jfrancoa and antas-marcin October 15, 2024 16:20

jfrancoa reviewed Oct 16, 2024

View reviewed changes

PR remarks

b328678

Signed-off-by: Kaviraj <[email protected]>

kavirajk added 2 commits October 17, 2024 11:31

use different port as 9091 is generally used by prometheus server

f1b8aa6

Signed-off-by: Kaviraj <[email protected]>

replace the newer querier image

b954d43

Signed-off-by: Kaviraj <[email protected]>

kavirajk requested a review from jfrancoa October 17, 2024 09:57

antas-marcin reviewed Oct 17, 2024

View reviewed changes

kavirajk added 2 commits October 18, 2024 11:57

change metrics port of querier to 2112

ea9c096

Signed-off-by: Kaviraj <[email protected]>

force download from s3 for demo

8847ae0

Signed-off-by: Kaviraj <[email protected]>

reyreaud-l approved these changes Nov 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support experimental `querier` component #267

feat: Support experimental `querier` component #267

kavirajk commented Oct 15, 2024

jfrancoa Oct 16, 2024

kavirajk Oct 17, 2024

jfrancoa Oct 16, 2024

kavirajk Oct 17, 2024

kavirajk Oct 17, 2024

jfrancoa Oct 16, 2024

kavirajk Oct 17, 2024

jfrancoa Oct 16, 2024

kavirajk Oct 17, 2024

kavirajk Oct 17, 2024

antas-marcin Oct 17, 2024

jfrancoa Oct 16, 2024

kavirajk Oct 17, 2024

kavirajk Oct 17, 2024

jfrancoa Oct 16, 2024

kavirajk Oct 17, 2024

kavirajk Oct 17, 2024

kavirajk Oct 17, 2024

kavirajk Oct 17, 2024

weaviate-git-bot commented Oct 16, 2024

antas-marcin Oct 17, 2024

antas-marcin Oct 17, 2024

antas-marcin Oct 17, 2024

antas-marcin Oct 17, 2024

antas-marcin Oct 17, 2024

		@@ -0,0 +1,456 @@
		{{ if .Values.experimental.querier.enabled }}

feat: Support experimental querier component #267

Are you sure you want to change the base?

feat: Support experimental querier component #267

Conversation

kavirajk commented Oct 15, 2024

Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weaviate-git-bot commented Oct 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feat: Support experimental `querier` component #267

feat: Support experimental `querier` component #267