Here, I have taken some of the leading data validation libraries in 2021 and I have analyzed and compared them using python which can be highly useful for your applications.
All the data validations libraries that I have utilized are Open Source tools and are mentioned below:
- Cerberus
- Colander
- Jsonschema
- Marshmellow
- Pydantic
- Schema
- Voluptuous
Below are some of the basic requirements/checks to choose the suitable tool for your applications:
1. Mandatory field check
Whenever we give request to an API, we will have certain fields as mandatory and few others as optional. So, I have considered this check as the primary one. No surprise here that all the leading libraries have this feature.
Data Validation Libraries | Mandatory field check |
---|---|
Cerberus | yes |
Colander | yes |
Jsonschema | yes |
Marshmellow | yes |
Pydantic | yes |
Schema | yes |
Voluptuous | yes |
2. Data type check
We all know most libraries will have certain standard data types (like str, int, dict, etc...) Even though its always advisable to make use of the standard datatypes provided by the validation libraries but sometimes we may need to extend standard libraries or to create own custom data type checks.
Data Validation Libraries | Data type check |
---|---|
Cerberus | yes |
Colander | yes |
Jsonschema | yes |
Marshmellow | yes |
Pydantic | yes |
Schema | yes |
Voluptuous | yes |
3. Min and Max option
As a kind of basic option, Field values should be validated for Minimum and Maximum characters/integers allowed. No wonder that most libraries have this option.
Data Validation Libraries | Min and Max option |
---|---|
Cerberus | yes |
Colander | yes |
Jsonschema | yes |
Marshmellow | yes |
Pydantic | yes |
Schema | No |
Voluptuous | yes |
4. Regex option
If we want to allow certain special characters or accept specific patterns, then indisputably, regex is the way to go and its quite indeed that all the below validation libraries have this feature.
Data Validation Libraries | Regex option |
---|---|
Cerberus | yes |
Colander | yes |
Jsonschema | yes |
Marshmellow | yes |
Pydantic | yes |
Schema | yes |
Voluptuous | yes |
5. Dynamic field validation based on another field
To validate the field based on the other dynamic field value given in the request. This check at times becomes significant for few applications.
Data Validation Libraries | Field validation based on other Dynamic field |
---|---|
Cerberus | No |
Colander | yes |
Jsonschema | yes |
Marshmellow | yes |
Pydantic | yes |
Schema | No |
Voluptuous | No |
6. Custom validation
Option to extend standard validation or create custom validation for our own applications.
Data Validation Libraries | Custom validation |
---|---|
Cerberus | yes |
Colander | yes |
Jsonschema | yes |
Marshmellow | yes |
Pydantic | yes |
Schema | yes |
Voluptuous | yes |
7. Error response for all fields
If multiple invalid values passed for various fields in the request, then it has to capture and throw error for all the invalid fields.
Data Validation Libraries | Error response for all fields |
---|---|
Cerberus | yes |
Colander | yes |
Jsonschema | yes |
Marshmellow | yes |
Pydantic | yes |
Schema | No |
Voluptuous | No |
8. Dynamic Alert message option based on error
Though we have standard error definition for different types of errors applicable for each field. Sometimes we may require to configure custom error message for few fields in our applications.
Data Validation Libraries | Dynamic alert message option |
---|---|
Cerberus | yes |
Colander | yes |
Jsonschema | yes |
Marshmellow | yes |
Pydantic | yes |
Schema | yes |
Voluptuous | yes |
9. Schema/Model Re-usability
On few occasions, we may need to extend or reuse necessary Schema/Model created for one of the API. This is nothing but class Inheritance.
Data Validation Libraries | Schema/Model Re-usability |
---|---|
Cerberus | yes |
Colander | yes |
Jsonschema | yes |
Marshmellow | yes |
Pydantic | yes |
Schema | yes |
Voluptuous | yes |
10. Python support
We also need to consider latest python version compatibility & forum support for the chosen packages (considered till March 2021)
Data Validation Libraries | Python support |
---|---|
Cerberus | yes |
Colander | yes |
Jsonschema | yes |
Marshmellow | yes |
Pydantic | yes |
Schema | Support is less |
Voluptuous | yes |
Libraries to Pick
Clearly, we could see that only four out of seven data validation libraries satisfies all our criteria's mentioned above.
- Pydantic,
- Marshmallow,
- Jsonschema and
- Colander
Performance comparison
We chose top three libraries from the above list and ran performance testing.
Number of records given in the request vs Time taken (in seconds) to process the request
Number of records | Pydantic | Marshmallow | Jsonschema |
---|---|---|---|
100 | 0.0039 | 0.033 | 0.037 |
1000 (1k) | 0.036 | 0.41 | 0.37 |
10000 (10k) | 0.36 | 3.36 | 3.59 |
100000 (1L) | 3.60 | 33.52 | 35.84 |
1000000 (10L) | 50.52 | 644.06 | 797.40 |
Clearly, we could see that Pydantic is 10X faster than other leading data validation libraries like Marshmallow, Jsonschema.
Hence Pydantic is an absolute winner and seems to satisfy all our basic requirements with a lightening performance (processing 10L requests in a minute)
Note: Please refer the attached link in my GitHub account for the sample python codebase for each of the libraries and how to effectively use the above functionalities or checks that I have briefly explained
PoC on data validation libraries python
Top comments (0)