fix: cache JWKS instance across verifyAccessToken calls#67
Conversation
The internal verifyAccessToken helper called createRemoteJWKSet from jose
on every invocation, which creates a fresh JWKS instance and throws away
jose's built-in per-instance key cache. This forces a network request to
the JWKS endpoint for every token verification, which has been observed
to take >3s under load.
Lazily initialize the JWKS instance at module scope so it is created
once and reused for the lifetime of the process. getWorkOS() and
getConfig('clientId') are still resolved on first use (not at module
load time), so consumer configuration remains respected.
Co-Authored-By: jonah <jonah+cursor@workos.com>
Original prompt from jonah
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Keep the lazy per-process cache, but also remember the URL it was built
for. If getWorkOS()/getConfig('clientId') ever produce a different URL
(e.g. a multi-tenant worker re-configures per request), discard the old
instance and build a new one instead of silently serving stale keys.
Add a test that exercises the URL-change path alongside the existing
'reuse on same URL' test.
Co-Authored-By: jonah <jonah+cursor@workos.com>
| }); | ||
|
|
||
| describe('JWKS caching', () => { | ||
| const createLoaderArgs = (request: Request): LoaderFunctionArgs => |
There was a problem hiding this comment.
Since we're dealing with a cache, and the cache is at the module-level, I think we should sandbox these tests within jest.isolateModules() to ensure that anything going on here doesn't affect any other tests. It would be an unlikely ordering bug should such an issue occur, but a subtle and annoying one, to be sure.
There was a problem hiding this comment.
Done in 57e2138 — each test now loads a fresh copy of ./session.js via jest.isolateModules(...) and re-wires the mocks on the isolated jose/workos/sessionStorage/iron-session. That also kills the module-scoped JWKS cache at the end of every test, so no state leaks out of this describe.
| let cachedJWKSUrl: string | undefined; | ||
|
|
||
| function getJWKS(): ReturnType<typeof createRemoteJWKSet> { | ||
| const jwksUrl = getWorkOS().userManagement.getJwksUrl(getConfig('clientId')); |
There was a problem hiding this comment.
Since getJwksUrl issues an API call, I think it would make sense, if possible, to check that getConfig('clientId') is in the cache. As in, if getConfig('clientId')'s JWKS URL is fetched once, don't fetch it again. I guess it depends on how frequently one's JWKS URL would change -- I'm not sure if this is a real problem or not, but it does seem expensive to hit an endpoint every time you're investigating a cache.
There was a problem hiding this comment.
Good instinct to check, but getJwksUrl is pure string formatting — no API call. In @workos-inc/node:
getJwksUrl(clientId) {
if (!clientId) {
throw TypeError('clientId must be a valid clientId');
}
return `${this.workos.baseURL}/sso/jwks/${clientId}`;
}So the "fetch once per clientId" goal is already what this cache achieves: same clientId → same URL → cache hit; no network. The only thing we re-invoke each call is a template-literal concatenation, which is effectively free vs. the actual JWKS round-trip this PR is eliminating.
There was a problem hiding this comment.
Ok, sounds good! Thanks clanker.
| expect(createRemoteJWKSetMock).toHaveBeenCalledTimes(1); | ||
| } finally { | ||
| if (originalImpl) { | ||
| getJwksUrl.mockImplementation(originalImpl); |
There was a problem hiding this comment.
This should be done in some kind of afterEach, rather than finally.
There was a problem hiding this comment.
Good call — with the isolated-modules refactor in 57e2138 the try/finally is gone entirely. The getJwksUrl mock lives on a workos instance inside the isolated module, so it's torn down with the rest of the isolated graph at end of test; nothing to restore.
Load a fresh copy of ./session.js inside jest.isolateModules for each JWKS caching test so the module-scoped JWKS cache cannot leak across tests (or out of the describe block). This removes the need for the try/finally restore of the getJwksUrl mock. Co-Authored-By: jonah <jonah+cursor@workos.com>
Co-Authored-By: jonah <jonah+cursor@workos.com>
| function loadIsolated(): IsolatedModules { | ||
| let isolated!: IsolatedModules; | ||
| jest.isolateModules(() => { | ||
| /* eslint-disable @typescript-eslint/no-require-imports, @typescript-eslint/no-var-requires */ |
There was a problem hiding this comment.
Can the code be written such that these disables are not necessary?
There was a problem hiding this comment.
Done in 71ba2b2 — swapped the bare require() calls for a createRequire(__filename)-bound loader so the eslint disables are gone. Tests still pass and npm run lint is clean locally.
Replaces the bare require() calls inside jest.isolateModules with a createRequire-bound loader so the eslint disables for @typescript-eslint/no-require-imports and no-var-requires are no longer needed. Co-Authored-By: jonah <jonah+cursor@workos.com>
|
|
||
| function getJWKS(): ReturnType<typeof createRemoteJWKSet> { | ||
| const jwksUrl = getWorkOS().userManagement.getJwksUrl(getConfig('clientId')); | ||
| if (!cachedJWKS || cachedJWKSUrl !== jwksUrl) { |
There was a problem hiding this comment.
In src/session.ts:603, cachedJWKS and cachedJWKSUrl are module-level let bindings — in environments that share a module instance across concurrent requests (Node.js workers, edge runtimes), what happens if two requests race through the if (!cachedJWKS || cachedJWKSUrl !== jwksUrl) check simultaneously with different URLs? Could both write cachedJWKS and cachedJWKSUrl non-atomically, leaving the URL and instance out of sync?
There was a problem hiding this comment.
Good question, but getJWKS() is fully synchronous — there's no await between the check and the two assignments — and both Node.js workers and edge runtimes (CF Workers, Vercel Edge, etc.) run JS on a single-threaded event loop. The function runs to completion atomically from the loop's perspective, so two concurrent verifyAccessToken calls cannot interleave between the if and the writes, and cachedJWKS/cachedJWKSUrl cannot end up out of sync.
Within a single process the failure mode is the trivial one: if two calls genuinely arrive in the same tick before either has populated the cache (only possible if a prior microtask scheduled them both), each would compute jwksUrl, both would see !cachedJWKS, and the second would just overwrite the first with an equivalent instance — wasted construction, never inconsistent state.
True parallelism only shows up across separate workers / isolates, and those don't share module state at all — each gets its own cachedJWKS, which is the intended behavior (one JWKS instance per process, with jose's built-in key cache per instance).
|
❌ Cannot revive Devin session - the session is too old. Please start a new session instead. |
Summary
verifyAccessTokeninsrc/session.tscalledcreateRemoteJWKSetfromjoseon every invocation, creating a fresh JWKS instance for each token verification. Since each instance maintains its own key cache, this effectively bypassedjose's built-in caching and forced a network round trip to the JWKS endpoint on every request — observed to take >3s under load againstapi.workos.com/sso/jwks/{clientId}.This change lazily initializes the JWKS instance at module scope and reuses it across all
verifyAccessTokencalls for the lifetime of the process.cachedJWKS: ReturnType<typeof createRemoteJWKSet> | undefined(not exported).getJWKS()helper (not exported) that creates the instance on first use and returns the cached one on subsequent calls. Lazy, not top-level —getWorkOS()andgetConfig('clientId')are only read on first access, so consumer configuration viaconfigure()is still respected.verifyAccessTokenkeeps the same signature and behavior; it just pulls the JWKS fromgetJWKS().createRemoteJWKSetthat prime the module-scoped cache, clear the mock, then invoke the authenticated loader path multiple times and assertcreateRemoteJWKSetis not called again even thoughjwtVerifystill runs every time.Review & Testing Checklist for Human
configure()runs at request boundaries — the cache is keyed by process lifetime, not byclientId, so switchingclientIdat runtime would continue to use the first-seen URL.api.workos.com/sso/jwks/...).src/session.tsaroundverifyAccessToken/getJWKSto confirm nothing else in the file references the internal cache.Notes
jose'screateRemoteJWKSetreturns agetKeyfunction that already performs per-instance key caching and coalesces concurrent JWKS fetches, so caching the instance (rather than caching raw key material ourselves) gives us that behavior for free while staying on the library's supported API.Link to Devin session: https://app.devin.ai/sessions/d92cce9f168c44f1903291c8d723895b
Requested by: @ohjonah