Skip to content

Conversation

ncarbon
Copy link
Collaborator

@ncarbon ncarbon commented Sep 18, 2025

Description

Validates input schema against the LLM output schema (faker) to ensure a one-to-one map.

  • Filters out fields that don't match the input schema structure.
  • Defaults unmapped input fields to "Unrecognized" faker method.
Screen.Recording.2025-09-19.at.11.53.01.AM.mov

Checklist

  • New tests and/or benchmarks are included
  • Documentation is changed or added
  • If this change updates the UI, screenshots/videos are added and a design review is requested
  • I have signed the MongoDB Contributor License Agreement (https://www.mongodb.com/legal/contributor-agreement)

Motivation and Context

  • Bugfix
  • New feature
  • Dependency update
  • Misc

Open Questions

Dependents

Types of changes

  • Backport Needed
  • Patch (non-breaking change which fixes an issue)
  • Minor (non-breaking change which adds functionality)
  • Major (fix or feature that would cause existing functionality to change)

@ncarbon ncarbon added the no release notes Fix or feature not for release notes label Sep 18, 2025
@github-actions github-actions bot added the feat label Sep 18, 2025
@ncarbon ncarbon requested a review from jcobis September 18, 2025 12:38
@ncarbon ncarbon marked this pull request as ready for review September 18, 2025 12:44
@ncarbon ncarbon requested a review from a team as a code owner September 18, 2025 12:44
@ncarbon ncarbon requested review from Copilot and paula-stacho and removed request for Copilot September 18, 2025 12:44
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements LLM output validation for the mock data generator by adding a validation layer that ensures a one-to-one mapping between input schema and faker schema. The validation filters out invalid fields from the LLM response and defaults unmapped input fields to "Unrecognized" faker methods.

  • Refactors the validateFakerSchema function to accept input schema and validate against it
  • Adds logic to filter out faker schema fields that don't exist in the input schema
  • Implements fallback handling for unmapped input schema fields with "Unrecognized" faker method

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
packages/compass-collection/src/modules/collection-tab.ts Updated validateFakerSchema function to validate faker schema against input schema and handle unmapped fields
packages/compass-collection/src/components/mock-data-generator-modal/mock-data-generator-modal.spec.tsx Added comprehensive tests for schema validation including filtered fields and unmapped field handling

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@ncarbon ncarbon requested a review from gribnoysup September 18, 2025 12:44
@ncarbon ncarbon force-pushed the CLOUDP-333855/faker-schema-validation branch from 5cabab1 to 9b9bea4 Compare September 18, 2025 18:44
@ncarbon ncarbon force-pushed the CLOUDP-333855/faker-schema-validation branch from 9b9bea4 to 867b002 Compare September 18, 2025 21:23
[activeField]: {
...currentMapping,
mongoType: newJsonType,
mongoType: newJsonType as MongoDBFieldType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of casting here, is it possible for us to just type the input arg as MongoDBFieldType?

Copy link
Collaborator

@paula-stacho paula-stacho Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking this down, the type is unclear because the onChange callback of <Select /> component provides a string. I don't see any way to clarify the type for the leafygreeen component (will ask them to confirm), so we might need the typecasting there. But still, it would be good if you don't do the typecasting here but at the source, otherwise it's hard to tell why are we typecasting and how do we even know that the type is correct.

onChange={(value) => onJsonTypeSelect(value as MongoDBFieldType)}

@ncarbon ncarbon requested a review from jcobis September 18, 2025 21:45
const { fieldPath, ...fieldMapping } = field;
result[fieldPath] = {
...fieldMapping,
mongoType: fieldMapping.mongoType as MongoDBFieldType,
Copy link
Collaborator

@jcobis jcobis Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make a sanity check here that mongoType is valid member of MongoDBFieldType, and if not, pass null perhaps? Then below in validateFakerSchema, if this mongoType is null, we can use the input schema's mongoType? Or something similar

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could we use a more fitting type in the zod definition?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the type in the zod definition to use MongoDBFieldType.

@ncarbon ncarbon force-pushed the CLOUDP-333855/faker-schema-validation branch from 53384e8 to 70f02bf Compare September 19, 2025 15:54
@ncarbon ncarbon requested a review from jcobis September 19, 2025 18:01
z.object({
fieldPath: z.string(),
mongoType: z.string(),
mongoType: z.string() as z.ZodType<MongoDBFieldType>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point you can validate the string. Unfortunately it seems mongodb-schema doesn't export the values (only the type). You can copy them here and create a ticket to add this export from the mongodb-schema package.
You can use z.custom<MongoDBFieldType> with a validator function that checks the values. https://zod.dev/api?id=custom

...field,
): FakerSchema => {
// Transform to keyed object structure
const fakerSchemaRaw = transformFakerSchemaToObject(fakerSchemaArray);
Copy link
Collaborator

@paula-stacho paula-stacho Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's generally better to have functions with a single, clear purpose (for readability and maintainability). if we don't need the array variant for validation, we can move the transformation out, so that this function only takes an object and returns a validated object.

@ncarbon ncarbon merged commit d07e301 into main Sep 22, 2025
57 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat no release notes Fix or feature not for release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants