-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support experimental querier
component
#267
base: master
Are you sure you want to change the base?
feat: Support experimental querier
component
#267
Conversation
set `experimental.querier.enabled=true` Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
kind: Deployment | ||
metadata: | ||
name: querier | ||
labels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that this is the first of many new "Weaviate components", have you though on adding a label to each of these components that will allow performing actions on all of them at once? Let's say, if we will have the querier, the batcher, the schema_handler (all invented :-D), if you want to monitor or perform some action on them at once (they are independent deployments) you can leverage labels by assigning a common label to the three deployments, for example "weaviate: serverless" and print all those pods with:
kubectl get pods -l weaviate=serverless
Of course ,this is something you can add later, it's just a suggestion, not anything blocking for the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very good idea. using labels to group these new components. I'm happy to add later as we add more components. Just don't want to add anything that's is not absolute necessary for now. Just to keep things simple (plus not a big fan of the name serverless
tbh :P and things can change in the future ).
app: querier | ||
app.kubernetes.io/name: querier | ||
spec: | ||
replicas: {{ .Values.replicas }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will always the number of replicas for the querier be associated to the number of total replicas? I mean, couldn't it happen that there will be more queriers than any other subcomponent? If that's the case, it might be interesting allowing overriding with a :
{{.Values.experimental.querier.replicas}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. That's copy-paste error. We should have separate replicas for querier
. Will add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
name: {{ $secret }} | ||
key: {{ $key }} | ||
{{- end }} | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unable to match this end to any other opening statement, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{- if or $.Values.experimental.querier.env $.Values.experimental.querier.envSecrets }}
{{- range $key, $value := $.Values.env }}
- name: {{ $key }}
value: {{ $value | quote }}
{{- end }}
{{- range $key, $secret := $.Values.experimental.querier.envSecrets }}
- name: {{ $key }}
valueFrom:
secretKeyRef:
name: {{ $secret }}
key: {{ $key }}
{{- end }}
{{- end }}
two end
for two range
and one end
for outer most if
. I copy pasted this from existing statefulset btw :)
{{- end }} | ||
resources: | ||
{{ toYaml .Values.resources | indent 10 }} | ||
env: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antas-marcin isn't there any way to isolate such a long env section? maybe moving it to _helpers? we do that for Weaviate's sts, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copy pasted this from Weaviate sts. Don't see any helpers there as well.
But this should be actually .Values.experimental.querier.resources
instead. I will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replicas: {{ .Values.replicas }} | ||
updateStrategy: | ||
{{ toYaml .Values.updateStrategy | indent 4}} | ||
serviceName: {{ .Values.experimental.querier.service.name }}-headless |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked GPT:
In Kubernetes, the serviceName field is not typically used directly in a Deployment object. Instead, it is more commonly found in the context of StatefulSets or Headless Services and some other resources.
However, here are some scenarios where serviceName is relevant:
1. StatefulSet
In the context of a StatefulSet, the serviceName field is used to associate a Headless Service with the StatefulSet. This ensures that the StatefulSet pods are discoverable by a stable DNS name based on their ordinal index (e.g., pod-0, pod-1, etc.). Each pod gets a unique DNS entry such as <pod-name>.<serviceName>, which is useful for stateful applications like databases.
Was it added intentionally or was it just a copy-paste from the Weaviate's sts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right. It's copy pasted. I don't think we need this. Will remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
storage_size: 10Gi | ||
pullPolicy: IfNotPresent | ||
command: ["/bin/weaviate"] | ||
args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we don't pass these args? I mean, could we provide some default values so that the querier would work out of the box? Mostly to improve the user experience and avoid requesting the user who deploys to know about the weaviate's architecture internals. I know this is a first PR to test the new querier, but for example, the contextionary endpoint should be added conditionally if contextionary is configured only. Otherwise it will fail when it won't be enabled as a module.
Also, the monitoring seems a bit redundant, don't we have an env var when we want monitoring to be working (PROMETHEUS_BLABLA_ENABLED)? we should enable this only if that env var is enabled, imho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we provide some default values so that the querier would work out of the box?
Absolutely we should. I thought I had some sane default values. But that doesn't work well with our helm chart values. I will update it 👍
I know this is a first PR to test the new querier, but for example, the contextionary endpoint should be added conditionally if contextionary is configured only. Otherwise it will fail when it won't be enabled as a module.
Currently querier
has hardcoded dependency with contextionary
unfortunately. We will remove it in the future and I will update the conditional loading of modules. Very good catch.
Also, the monitoring seems a bit redundant, don't we have an env var when we want monitoring to be working (PROMETHEUS_BLABLA_ENABLED)? we should enable this only if that env var is enabled, imho
Good point. Currently querier
doesn't respect the ENV PROMETHEUS_MONITORING_ENABLED
. I think we should (I can take it up in the follow up PR if that's ok?).
Having said that I have some opinions about configs :) I like to keep this cli-flags as well. IMHO cli-flags should take highest priority than ENVs (and every config should be able to pass via cli-flags in addition to ENVs). Not a big fan of ENV based configs tbh, in my experience it's hard to manage at scale (10s is fine, 100s is crazy). Prefer cli or file based config instead. Main rationale is I want my configs to be "local" and "explicit". ENVs are by design "global" and "implicit" (any other process that share same pod, shell can set different values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely we should. I thought I had some sane default values. But that doesn't work well with our helm chart values. I will update it 👍
I think we still cannot provide sane default for some flags like schema_url
, contextionary_url
and minio_url
. Because this is totally different when running in k8s, all those will be local service
name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed some args to make it work with defaults.
This PR also changes port mapping 9091 (used by prometheus server in our setup) to 7071
weaviate/weaviate#6043
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
To avoid any confusion in the future about your contribution to Weaviate, we work with a Contributor License Agreement. If you agree, you can simply add a comment to this PR that you agree with the CLA so that we can merge. |
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: querier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefix all of our services with weaviate
so the name that I suggest here would be then weaviate-querier
and all of the labels also should have that prefix to easily find "our" services / deployments
metadata: | ||
name: {{ .Values.experimental.querier.service.name }} | ||
labels: | ||
app.kubernetes.io/name: weaviate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we agree on weaviate-querier
naming? to be used throughout all of those files?
{{- end }} | ||
{{- end }} | ||
- name: CLUSTER_JOIN | ||
value: {{ .Values.service.name }}-headless.{{ .Release.Namespace }}.svc.{{ .Values.clusterDomain }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we use a separate headless service for weaviate-querier
deployment? how about scaling this service? and the raft response from this service? should querier be aware of all Weaviate pods?
@@ -0,0 +1,456 @@ | |||
{{ if .Values.experimental.querier.enabled }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kavirajk can we add some unit tests? for you deployment?
querier: | ||
enabled: false | ||
replicas: 3 | ||
image: semitechnologies/weaviate-experimental:preview-chore-change-default-configs-on-querier-23d39eb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
This shouldn't impact any of the existing uses cases of helm chart.
You can enable querier (Deployment) by have overrides
values.experimental.querier.enabled
flag.You can find example overrides here.
Querier takes mainly three dependencies
v1/schema
)Changes
gPRC
andhttp
(currently only metrics server) servers.