-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat(server): meter GET /records egress bytes #6648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| import { makeDataTransferEvent, pubsub } from '@nangohq/shared'; | ||
| import { Batcher, ENVS, getLogger, parseEnvs } from '@nangohq/utils'; | ||
|
|
||
| import type { DataTransferCallsite } from '@nangohq/types'; | ||
| import type { Grouping, Result } from '@nangohq/utils'; | ||
|
|
||
| const envs = parseEnvs(ENVS); | ||
| const logger = getLogger('server.egress.telemetry'); | ||
|
|
||
| export interface ServerEgressTelemetry { | ||
| package: 'server'; | ||
| callsite: Extract<DataTransferCallsite, 'get_/records'>; | ||
| accountId: number; | ||
| connectionId: string; | ||
| integrationId: string; | ||
| environmentId: number; | ||
| environmentName: string; | ||
| egressedBytes: number; | ||
| count: number; | ||
| } | ||
|
|
||
| const grouping: Grouping<ServerEgressTelemetry> = { | ||
| groupingKey: (t) => `${t.callsite}:${t.accountId}:${t.environmentId}:${t.integrationId}:${t.connectionId}`, | ||
| aggregate: (acc, t) => ({ | ||
| ...acc, | ||
| egressedBytes: Math.min(acc.egressedBytes + t.egressedBytes, Number.MAX_SAFE_INTEGER), | ||
| count: acc.count + t.count | ||
| }) | ||
| }; | ||
|
|
||
| const batcher = new Batcher<ServerEgressTelemetry>({ | ||
| maxBatchSize: envs.SERVER_EGRESS_TELEMETRY_BATCH_SIZE, | ||
| flushIntervalMs: envs.SERVER_EGRESS_TELEMETRY_FLUSH_INTERVAL_MS, | ||
| maxQueueSize: envs.SERVER_EGRESS_TELEMETRY_MAX_QUEUE_SIZE, | ||
| grouping, | ||
| logger, | ||
| process: async (events) => { | ||
| const res = await pubsub.publisher.publishBatch({ | ||
| subject: 'usage', | ||
| events: events.map((t) => | ||
| makeDataTransferEvent({ | ||
| pkg: t.package, | ||
| callsite: t.callsite, | ||
| accountId: t.accountId, | ||
| connectionId: t.connectionId, | ||
| integrationId: t.integrationId, | ||
| environmentId: t.environmentId, | ||
| environmentName: t.environmentName, | ||
| meteredBytes: { sent: t.egressedBytes, received: 0 }, | ||
| count: t.count | ||
| }) | ||
| ) | ||
| }); | ||
| if (res.isErr()) { | ||
| // throw so the Batcher re-queues and retries | ||
| throw res.error; | ||
| } | ||
| } | ||
| }); | ||
|
|
||
| export const egressTelemetryRecorder = { | ||
| record(entry: ServerEgressTelemetry): void { | ||
| const res = batcher.add(entry); | ||
| if (res.isErr()) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You will tell me, but do we need some cohesive metric here for knowing when the batcher is strugling, for knowing when we are droping metrics? For clickhouse we implemented this metric for knwoing when events are dropped, either bc the queue was full, we reached max number of retries, etc
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm currently relying on log-based filters and was planning on extracting metrics out of those logs with this. So the tl;dr is that I'm tracking it, just not with a regular metric maintained at the app level. Alternatively, since the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Im ok on doing this in a new PR! |
||
| logger.error(`Dropped server egress telemetry: ${res.error.message}`); | ||
| } | ||
| }, | ||
| async shutdown(opts?: { timeoutMs: number }): Promise<Result<void>> { | ||
| const res = await batcher.shutdown(opts); | ||
| if (res.isErr()) { | ||
| logger.error(`Server egress telemetry recorder shutdown error: ${res.error.message}`); | ||
| } | ||
|
|
||
| return res; | ||
| } | ||
| }; | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -33,6 +33,9 @@ export const ENVS = z.object({ | |
| NANGO_ADMIN_INVITE_TOKEN: z.string().optional(), | ||
| NANGO_SERVER_PUBLIC_BODY_LIMIT: z.string().optional().default('75mb'), | ||
| SERVER_SHUTDOWN_DELAY_MS: z.coerce.number().optional().default(0), | ||
| SERVER_EGRESS_TELEMETRY_BATCH_SIZE: z.coerce.number().int().positive().default(1_000), | ||
| SERVER_EGRESS_TELEMETRY_FLUSH_INTERVAL_MS: z.coerce.number().int().nonnegative().default(60_000), | ||
| SERVER_EGRESS_TELEMETRY_MAX_QUEUE_SIZE: z.coerce.number().int().positive().default(100_000), | ||
|
Comment on lines
+36
to
+38
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should they be prefixed with
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, if they should, this boat has kind of sailed already as I introduced these two in a previous PR: But I also see a bunch of other env vars without the |
||
| NANGO_PROXY_BASE_URL_OVERRIDE_ENABLED: z.stringbool().optional().default(true), | ||
| NANGO_PROXY_BASE_URL_OVERRIDE_DENYLIST: z | ||
| .string() | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.