Skip to content

Working with YAML

Part of Essentials

This is part of Essentials — core Python patterns for working platform engineers.

You work with YAML every day — Kubernetes manifests, Helm values files, Ansible playbooks, Docker Compose files. Editing them by hand is fine for one-offs. For bulk changes, generating manifests programmatically, or comparing configs across environments, you need Python.


Where You've Seen This

Every kubectl apply -f runs against YAML. Every Helm chart is YAML templates. Editing replica counts, image tags, or resource limits across 15 deployments by hand is error-prone and slow. Python can do it in a loop.

If you've used the config comparison script from Day One, you already know that json.load() turns JSON into Python dicts. YAML works the same way — yaml.safe_load() turns a YAML file into Python dicts and lists you can read, modify, and write back.


Setup

PyYAML ships with most Python environments. If it's not installed:

Install PyYAML
pip install pyyaml

Reading a YAML File

Reading a YAML file
1
2
3
4
5
6
7
8
9
import yaml

with open("deployment.yaml") as f:
    manifest = yaml.safe_load(f)  # (1)!

print(manifest["metadata"]["name"])         # "myapp"
print(manifest["spec"]["replicas"])         # 3
print(manifest["spec"]["template"]["spec"]  # image string
      ["containers"][0]["image"])
  1. Always safe_load, never yaml.load(). yaml.load() can deserialize arbitrary Python objects and execute code. safe_load only deserializes basic types (dicts, lists, strings, numbers, booleans). This is not optional.

The result is a Python dict. Every YAML key becomes a dict key, every YAML list becomes a Python list, every YAML value becomes the appropriate Python type.


Modifying and Writing Back

Update image tag and write back
import yaml
import sys

image_tag = sys.argv[1]  # e.g., "myapp:v1.4.2"

with open("deployment.yaml") as f:
    manifest = yaml.safe_load(f)

# Navigate to the field and update it
manifest["spec"]["template"]["spec"]["containers"][0]["image"] = image_tag  # (1)!

with open("deployment.yaml", "w") as f:
    yaml.dump(manifest, f, default_flow_style=False)  # (2)!

print(f"✓ Updated image to {image_tag}")
  1. This is standard Python dict access — you're just navigating a nested structure. If the path doesn't exist, you get a KeyError. Add .get() calls at each level if you're not sure the structure is consistent.
  2. default_flow_style=False writes block-style YAML (the human-readable multi-line format) instead of inline JSON-like syntax. Almost always what you want.
Using the script
python update_image.py myapp:v1.4.2

The Round-Trip Problem

PyYAML does not preserve comments. If your YAML file has comments like # This controls the replica count, they're gone after a safe_loaddump cycle.

Before (with comments)
spec:
  replicas: 3  # Adjust this for traffic spikes
  template:
    metadata:
      labels:
        app: myapp
After yaml.dump()
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: myapp

The comment is lost. Key ordering may also change.

For scripts that modify files humans also edit, this is a problem. Two options:

Install ruamel.yaml
pip install ruamel.yaml
Round-trip YAML with ruamel.yaml
from ruamel.yaml import YAML

yaml = YAML()
yaml.preserve_quotes = True

with open("deployment.yaml") as f:
    manifest = yaml.load(f)

manifest["spec"]["replicas"] = 5  # comment preserved

with open("deployment.yaml", "w") as f:
    yaml.dump(manifest, f)

ruamel.yaml was built specifically for this use case. Use it when comment preservation matters.

Standard PyYAML
1
2
3
4
5
6
7
8
9
import yaml

with open("deployment.yaml") as f:
    manifest = yaml.safe_load(f)

manifest["spec"]["replicas"] = 5

with open("deployment.yaml", "w") as f:
    yaml.dump(manifest, f, default_flow_style=False)

Use PyYAML when the YAML is generated by your pipeline (not hand-edited), or when you're generating new files from scratch.


Multi-Document YAML

Kubernetes manifests often contain multiple resources in one file, separated by ---:

multi-resource.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
---
apiVersion: v1
kind: Service
metadata:
  name: myapp-service

safe_load() only reads the first document. Use safe_load_all() for multi-document files:

Reading multi-document YAML
1
2
3
4
5
6
7
8
9
import yaml

with open("multi-resource.yaml") as f:
    documents = list(yaml.safe_load_all(f))  # (1)!

for doc in documents:
    print(f"{doc['kind']}: {doc['metadata']['name']}")
# Deployment: myapp
# Service: myapp-service
  1. safe_load_all() returns a generator. list() materializes it so you can loop over it multiple times or index into it.

Writing multiple documents back:

Writing multi-document YAML
with open("multi-resource.yaml", "w") as f:
    yaml.dump_all(documents, f, default_flow_style=False)

Generating YAML Programmatically

Sometimes you need to create manifests from scratch — generating a ConfigMap for each environment, creating ServiceAccounts for a list of teams, or templating resources that Helm can't handle cleanly.

Generate a Kubernetes ConfigMap
import yaml

def make_configmap(name, namespace, data):
    return {
        "apiVersion": "v1",
        "kind": "ConfigMap",
        "metadata": {
            "name": name,
            "namespace": namespace,
        },
        "data": data,
    }

environments = {
    "staging": {"LOG_LEVEL": "DEBUG", "API_URL": "https://staging.api.internal"},
    "production": {"LOG_LEVEL": "INFO", "API_URL": "https://api.internal"},
}

manifests = [
    make_configmap(f"myapp-config", env, data)
    for env, data in environments.items()
]

with open("configmaps.yaml", "w") as f:
    yaml.dump_all(manifests, f, default_flow_style=False)

print(f"✓ Generated {len(manifests)} ConfigMaps")

This is where Python beats Helm templates for simple generation tasks — no template language to learn, no {{ }} syntax, just Python dicts.


Handling Missing Keys Safely

Kubernetes manifests have optional fields. Navigating a path that might not exist raises KeyError:

Safe navigation of nested YAML
# ❌ KeyError if resources or limits isn't set
limit = manifest["spec"]["template"]["spec"]["containers"][0]["resources"]["limits"]["memory"]

# ✅ Safe navigation with .get()
containers = (manifest
    .get("spec", {})
    .get("template", {})
    .get("spec", {})
    .get("containers", []))

if containers:
    resources = containers[0].get("resources", {})
    limits = resources.get("limits", {})
    memory = limits.get("memory", "not set")
    print(f"Memory limit: {memory}")

Chaining .get({}) at each level returns an empty dict if the key is missing, letting the next .get() work without raising. Verbose but reliable.


Practice Exercises

Exercise 1: Update replica count across environments

You have staging.yaml and production.yaml, both Kubernetes Deployment manifests. Write a script that reads both, prints the current replica count for each, and updates staging to 2 replicas and production to 5 replicas.

Answer
Update replicas
import yaml

configs = {
    "staging": ("staging.yaml", 2),
    "production": ("production.yaml", 5),
}

for env, (path, replicas) in configs.items():
    with open(path) as f:
        manifest = yaml.safe_load(f)

    current = manifest["spec"]["replicas"]
    print(f"{env}: {current}{replicas} replicas")

    manifest["spec"]["replicas"] = replicas

    with open(path, "w") as f:
        yaml.dump(manifest, f, default_flow_style=False)
Exercise 2: Extract all image names from a multi-document manifest

Given a multi-document YAML file containing several Deployments, print the name and image of every container across all Deployments.

Answer
Extract all images
import yaml

with open("manifests.yaml") as f:
    documents = list(yaml.safe_load_all(f))

for doc in documents:
    if doc.get("kind") != "Deployment":
        continue
    name = doc["metadata"]["name"]
    containers = (doc.get("spec", {})
                    .get("template", {})
                    .get("spec", {})
                    .get("containers", []))
    for container in containers:
        print(f"{name}/{container['name']}: {container['image']}")

Quick Recap

Concept What It Does
yaml.safe_load(f) Parse YAML file → Python dict/list (always safe_load)
yaml.dump(data, f, default_flow_style=False) Write dict/list → block-style YAML
yaml.safe_load_all(f) Parse multi-document YAML → generator of dicts
yaml.dump_all(docs, f) Write multiple dicts → multi-document YAML
ruamel.yaml Round-trip YAML that preserves comments
.get("key", {}) chained Navigate nested YAML without KeyError

What's Next

The dict navigation patterns you've used here — nested key access, iteration, filtering — apply directly to JSON API responses. That's the next Essentials article, coming soon.

In the meantime, the config comparison script from Day One shows the same dict-diffing patterns applied to both JSON and YAML.

Further Reading

Official Documentation

Libraries

  • ruamel.yaml — Comment-preserving YAML for files humans also edit
  • PyYAML — The standard library for YAML in Python

Deep Dives

  • YAML specification — When you need to understand why YAML is behaving unexpectedly (the Norway problem, booleans that aren't, etc.)

Exploring Kubernetes

  • kubectl First Deploy — Applying the manifests you're building and modifying here with kubectl apply -f

Exploring Computer Science

  • How Parsers Work — What yaml.safe_load() is actually doing under the hood when it converts text into Python data structures