-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS SQS Queue Filtering #2066
Comments
Thinking about the implementation - I think we probably should spawn a pod that reads from the real queue, then change the current service to use the "forked" queue, and the local service to also use the forked queue (but with filter). |
I think it might be possible to implement without requiring the user to create a fork resource ahead of time. All the necessary info can be passed in the mirrord configuration, and the queue splitter can be started with the first filtering client. What do you think, @aviramha? |
Yup I tried to go there in my explanation but might have been less clear |
So we probably don't need a CRD, right? |
I thought we should have the CRD so the configuration will be centralized - I think having it in the mirrord json is complicated and you don't want users to mistakenly cause issues on the cluster by miscondiguration |
Are |
from original issue:
I imagine having a CRD that specifies what kind of fun stuff happens when you start mirrord on a specific target, which is a list of options, and the list will only support for now SQS Forking. |
Is splitting queues from multiple cloud services at the same time a common use-case? Otherwise we could make it a single enum object instead of a list of them. |
Imagine a service using SQS + Kafka, or when we introduce DB splitting or other stuff similar - it'd be convenient to have this defined per "workflow" / deployment |
I created CRD and configuration code at #2173 according to this issue. The configuration code isn't valid yet but already roughly represents the structure of the configuration. An example apiVersion: splitters.mirrord.metalbear.co/v1alpha
kind: MirrordQueueSplitter
metadata:
name: whatever-q-splitter
namespace: default
spec:
queues:
updates-queue:
SQS:
queueNameSource:
configMap:
name: whatever-config-map
queueNameKey: sqsUpdatesQueueName
region: whatever-1
tasks-queue:
SQS:
queueNameSource:
configMap:
name: whatever-config-map
queueNameKey: sqsTasksQueueName
region: whatever-1
consumer:
deployment: my-deploy And the mirrord configuration file of a user could look something like this: {
"feature": {
"split_queues": {
"whatever-q-splitter": {
"updates-queue": {
"SQS": {
"wows": {
"number_attribute": [
{ "greater_than_int": 2 },
{ "lesser_than_int": 6 }
]
},
"coolz": {
"string_attribute": "^very .*"
}
}
}
}
}
}
}
In my opinion that's a bit more complicated than it has to be. I recommend that instead, we create a CRD which we call |
Thinking about long term here, I think the CRD should be |
I see what you're saying. Just so we're on the same page:
|
{
"feature": {
"split_queues": {
"whatever-q-splitter": { <-- I imagine the mutation of the env var / config map would happen from the operator side - for example when user fetches env it patches it (could be a cool feature regardless tbh ;) replace env for users)
"first-queue": { <-- if the crd contains many queues, we need to identify the specific one - agree
"SQS": { <-- I'm not sure we can drop that if we have multiple q types in the same CRD - we can, just make it internally tagged enum. in any case, we can make it also not tagged so it will just see if it can parse it as the queue that exists in the remote - for example you'd send this configuration as JSON to the upstream, and operator knows it's SQS so it tries to parse this as SQS.
"wows": { <-- This is the name of the first attribute to filter SQS messages by. agree
"number_attribute": [ <-- Attribute filter
{ "greater_than_int": 2 },
{ "lesser_than_int": 6 }
]
},
"coolz": { <-- Another attribute
"string_attribute": "^very .*"
}
}
}
}
}
}
} |
So you want the operator to auto-detect the splitter resource based on the target's deployment/rollout? |
on another note - how did you think to do the queue filtering? in the operator itself? Initially I thought we'd spawn a job/pod for it but then realized it's better to start in the operator - also if another pod I'd reuse the operator image for enterprise users so they won't need to copy many images to their internal registry. |
Yeah, I actually think that's kinda of a policy. I do think maybe later on we'd add a feature to opt-out from the splitting, but the default and initially it'd force the user to get splitted. |
The user has to specify filters anyways, so I don't see why/when there would be force splitting. If the user does not specify a filter for a queue, we don't split it. |
It's a UX decision that matters - what happens if a user without a configuration does that when it's already split by other users? You can:
I think our mandate should go towards safety of sharing environments by default, which would mean option 1, and that's why I'm leaning towards forcing users to get a split queue always, when a split is defined. |
Yeah number 1 is reasonable (imo also 3 is good) - should we do it like our stealing policy:
|
|
What unsafe behaviour? |
Taking all messages |
Why is that unsafe? |
Just throwing another idea out there - we could also go in the direction of a microservice architecture and have a splitter service that the operator uses for splitting by sending requests to start/stop splitting. |
Because it means a service answers for all and other users can't filter.
I initially had same idea, but the problem is that it adds complexity and usually means another image, which means another image for the enterprise people to copy to their internal registries etc.. |
In this example from the original issue comment the queue name is defined inside a file/text field of a config map. I forgot about this example at some point. The interface we talked for the It's not a big change, doesn't affect schedule much. |
I think we can go for the first case then extend for the first case. |
Hi, anyone who subscribed to it - we have an alpha release ready if you want to try it out. |
First version released, tracking next steps in https://github.com/metalbear-co/operator/issues/630 |
As part of the mirrord for Teams solution, we'd like developers to be able to "fork" a queue based on a filter.
This means the original queue would then split into sub-queues based on a message attribute, so each engineer can consume their relevant messages.
This is currently implemented for SQS, then Rabbit/Kafka (and more later!)
Current plan:
Example user config map:
As env from configmap
AS mounted configmap
Issue state:
Follow-up issues / things we're leaving for future versions:
TODOs:
PodSpec
s so that they can be created withkubectl apply
(~ 1 day).HashMaps
toBTreeMaps
in CRDs. (no time).Next Version:
The text was updated successfully, but these errors were encountered: