Skip to content
This repository was archived by the owner on Aug 13, 2024. It is now read-only.

For developers

Anton Parkhomenko edited this page Aug 24, 2015 · 2 revisions

CLI & IO

Primary intention of Main class in Schema Guru is to take entered subcommand, create corresponding subcommand object and pass control to it. All subcommand-related code is contained in cli package. Since argonaut parser doesn't have support of anything similar to subcommands, we implemented them by ourselves in GuruCommand trait. It contains separate argonaut parser, name of command (like "schema" and "ddl") and it's description for help message. Everything else is up to Command classes. They mostly look like usual App objects.

Nothing in Schema Guru except output method in subcommand classes has Unit type. And all code related to input from file system is contained inside utils package. Everthing else imply work with pure functions and have no shared state.

Schema Derivation

Data

All schema types are described in schema package. Most of them represents JSON Schema types like "string", "object", "null", etc. But there's few auxilirary Schema types not presented in JSON Schema Specification.

Each schema type need to mixin JsonSchema trait. This trait require to:

  • implement metod toJson which will show how to represent this type as JSON object
  • implement partial function mergeSameType which provide fine-grained control over merge two schemas of the same type
  • implicitly provide SchemaContext on create (by deriving, merging or just instantiating) schema type object.

None of properties is required in any type of JSON Schema. Thus all properties in all schema types are optional. Minimal JSON Schema is just an empty hash {}. It will validate any JSON value. It's represented as ZeroSchema in our types (one of those auxilirary schema types).

Another special schema type is ProductType. It help us to map "dynamic world" of JSON to "static world" of Scala types, because JSON Schema Specification states that we can have value that can any of two or more types. When we try to merge two schemas of different types like "string" and "object" or "null" and "integer" we will get ProductType on output with all information presented in those original types. ProductType optionally contains each of schemas types and provide correct output with toJson (becuase in the end, only this output matters). If we try to merge "object" schema into product type ["string", "null"] it will just place all object's info into it's place in ProductType case class. If we try to merge ["string", "object"] with another ["string", "object"] it will use mergeSameType for each of corresponding type. If we try to merge "string" and "string"... well we will get a "string", it's not a product case.

merge for all schema types is defined in terms of partial functions. Each type must have defined mergeSameType partial function which contains logic of merging two similar types. Because each property need to have it's own rules for merge: "minimum" will take lesser value and eliminate greater, "maximum" other way round, "format" should be eliminated if another value encountered and so on. merge will sequentally try each of four partial functions and mergeSameType is the first one. It will stop if one of these four functions is defined over argument (other three already defined in JsonSchema trait).

SchemaContext is a special case class which give Schema Guru hints about how to create and merge schema types. It is basically something that is passed from outer world (like user preferences) that will affect our schemas. It can be limit for enum cardinality or rules to apply pattern suggestions. It is being implicitly passed around by every function that creates and merges schema types.

Business logic

All work is happening in convertsJsonsToSchema and mergeAndTransform. They called sequentally and probably can be even a single long function.

First one takes a list of JSON instances (received from FS or network) and tries to convert each one into micro-schema. Micro-schema is usual subclass of JsonSchema, but it is "micro" because it can validate only one value, which it was derived against. For example for value 42 micro-schema will be {"type": "integer", "minimum": 42, "maximum": 42, "enum": [42]}. None of values except 42 will pass validation against such schema.

Then we pass list of these micro-schemas to mergeAndTransform. Now all of these micro-schemas will be merged (summed using Monoid) into one which will validate all of them. For example, if we're merging two integers with micro-schemas like {"maximum": 10, "minimum: 10}, {"maximum": 14, "minimum": 14} it will give us {"maximum": 14, "minimum": 10} converting it to something more sensible. So these boundaries are always expanding only to validate merged micro-schemas.

Last step to create meaningful JSON Schema is to apply required transformations. Method transform defined on complex schema types (object, array, product) will recursively apply it's argument (partial function) to all nested primitive schema types (string, number, integer). One of those transformations for example is encaseNumericRange which will expand above {"maximum": 14, "minimum": 10} to something more usefule like {"maximum": 32767, "minimum": 0} (positive 16-bit integer).

DDL Generation

All the DDL generation logic contained in schema-ddl schema-ddl package.

Clone this wiki locally