DEV Community

Joseph D. Marhee
Joseph D. Marhee

Posted on

Example of Yaml Generator and Validator in Python

If you work with Yaml regularly or not, the thing most people know about it is that it definitely cares about whitespace, and even careful practitioners can still sometimes automate a bad process, and with Yaml, this is a bad time, so validating (particularly when generating Yaml, to say nothing of writing it by hand) is a must.
Let's take a common Yaml use case: Kubernetes manifests. In my case, I wanted to create different configurations, populate information on-the-fly (things like tokens of a known length, for example), and then dump to a Yaml file used elsehwere. I did this with Python using pyyaml.
To use encryption at-rest in your cluster for resources like secrets, Kubernetes requires an EncryptionConfig file, which is a fairly short piece of Yaml to generate, it just needs the provider, a key, and which resource to encrypt at rest in Etcd, which to generate as Yaml, I'm just going to represent this as JSON:

configIn = {
        "kind": "EncryptionConfig",
        "apiVersion": "v1",
        "resources": [
            {
            "resources": [
                "secrets"
            ],
            "providers": [
                {
                "aescbc": {
                    "keys": [
                    {
                        "name": "key1",
                        "secret": "%s" % (generateSecret(32))
                    }
                    ]
                }
                }
            ]
            }
        ]
        }

and then we're going to use that generateSecret (a lambda that takes a string length and returns a base64-encoded version of a random string of that length) result to populate that JSON object's value:

import base64
import random
import string
import os
import sys
import yaml
generateSecret = lambda length: base64.b64encode(''.join(random.sample(string.lowercase+string.digits,length))) #32 length
def populateConfig():
    configIn = {
        "kind": "EncryptionConfig",
        "apiVersion": "v1",
        "resources": [
            {
            "resources": [
                "secrets"
            ],
            "providers": [
                {
                "aescbc": {
                    "keys": [
                    {
                        "name": "key1",
                        "secret": "%s" % (generateSecret(32))
                    }
                    ]
                }
                }
            ]
            }
        ]
        }

    configOut = yaml.dump(configIn)
    return configOut

and then have yaml.dump return that object to us as Yaml:

apiVersion: v1
kind: EncryptionConfig
resources:
- providers:
  - aescbc:
      keys:
      - {name: key1, secret: BASE64_STRING }
  resources: [secrets]

which is valid Yaml, but to make it idiomatic with the Kubernetes style (and because the experimental feature supported by this won't accept this as of 1.11), we'll change the configOut line's dump option to look like this:

 configOut = yaml.dump(configIn,default_flow_style=False)

to return:

apiVersion: v1
kind: EncryptionConfig
resources:
- providers:
  - aescbc:
      keys:
      - name: key1
        secret: BASE64_STRING
  resources:
  - secrets

Okay, great, we've got our config, and it looks reasonably correct, but since it's automatically created, we probably want to double check.
There's a few ways to do this, but because my input was relatively simple, and the schema wasn't being modified in any meaningful way, just populating data, and because I'd prefer to do with this with the libraries already imported, we can use the yaml package's built-in safe_load method to see if an incoming config (like the one returned by the above function) validates:

def validateYaml(config):
    try:
        yaml.safe_load(config)
        return config
    except:
        sys.exit('Failed to validate config.')

This function will bail if the config cannot validate (which becomes important in a moment), but returns the valid config if it does, so with this information, we can advance to our program's entrypoint to stitch all this together, where we'll write the config to a file if it is valid:

if __name__ == '__main__':  
    config = validateYaml(populateConfig())
    EncryptionConfig = open("secrets.conf","w")
    EncryptionConfig.write(config)
    EncryptionConfig.close()
    print "OK"

If validateYaml fails, it will prevent us from writing a bad config (or at least one that is certain not to work, other validation issues may present themselves that safe_load may not detect by default in a more complicate Yaml input).

Top comments (0)