Tuesday, January 30, 2018

Schema validation for a schemaless database: is it a contradiction?


MongoDB recently introduced, with its version 3.6, a validation capability using JSON Schema syntax.  As we keep hearing that one of the great benefits of NoSQL is the absence of schema, isn’t this new feature an admission of the limitations of NoSQL databases?  The answer is a resounding NO: schema validation actually brings the best of both worlds to NoSQL databases!

Previously with version 3.2, MongoDB had introduced a validation capability, using their Aggregation Framework syntax.  This was in response to the request of enterprises wishing to leverage the benefits of NoSQL, without risk of losing control of their data.  JSON Schema is the schema definition standard for JSON files, sort of the equivalent of XSD for XML files.  So, it was only natural that MongoDB would adopt the JSON Schema standard.  There are multiple reasons to leverage this capability: 

1)     Enforcing schema only when it matters: with JSON Schema, you can declare fields where you want enforcement to take place.  And let other fields be added with no enforcement at all, by using the property: ‘AdditionalProperties’.  Some fields are more important than others in a document.  In particular in the context of privacy laws and GDPR, you may want to track some aspects of your schema and ensure consistency.  You may also want to control data quality with field constraints such as string length or regular expression, numeric upper and lower limits, etc…

2)     JSON polymorphism: having a schema declared and enforced does not at all limit you in your ability to have multi-type fields or flexible polymorphic structures.  It only makes sure that they do not occur as a result of development mistakes.  JSON Schema, with oneOf/anyOf/allOf/noneOf choices, lets you declare in your validation rules exactly what is allowed and what is not allowed.

3)     Degree of enforcement: MongoDB lets you decide, for each collection, the validation level (off, strict or moderate), and the validation action to be returned by the database through the driver (warning or error.) 

In effect, the $jsonschema validator becomes the equivalent of a DDL (data definition language) for NoSQL databases, letting you apply just the right level of control to your database.


Hackolade model dynamically generates MongoDB $jsonschema validator
Since Hackolade was built from the ground up on JSON Schema, it has been quite easy to maintain MongoDB certification as a result of this v3.6 enhancement.  No JSON Schema knowledge is required!  You build your collection model with a few mouse clicks, and Hackolade dynamically generates the JSON Schema script for creation or update of the collection validator.