Recently I was involved in a project were I had to load data from different providers, each one providing data in a different format, JSON
in the best case but even in XML
or YAML
.
Received data was dirty and pretty far from being the data you would like to receive, for example:
- Extra white space in values
- Boolean values expressed using words like
Yes
orNo
- Datetime values always expressed in different formats
- Keys didn't respect a standard naming convention like
snake_case
orcamelCase
- Unexpected data structure changes without any warning by the provider
To wake me up from this nightmare I decided to normalize and simplify the whole process reducing unexpected errors near to zero.
The abstract problem was always the same:
- Reading/writing data from/to different formats in a standard way
- Accessing nested data values quickly, using keypath
- Get data values trying to parse them in the expected type
I decided to write my own library.
python-benedict
is a dict
subclass with keypath support, I/O shortcuts (Base64
, JSON
, TOML
, XML
, YAML
, query-string
) and many utility methods.
It's open-source on GitHub:
https://github.com/fabiocaccamo/python-benedict
Check it out, any feedback is appreciated.
Thanks
Top comments (0)