Working with YAML
Part of Essentials
This is part of Essentials — core Python patterns for working platform engineers.
You work with YAML every day — Kubernetes manifests, Helm values files, Ansible playbooks, Docker Compose files. Editing them by hand is fine for one-offs. For bulk changes, generating manifests programmatically, or comparing configs across environments, you need Python.
Where You've Seen This
Every kubectl apply -f runs against YAML. Every Helm chart is YAML templates. Editing replica counts, image tags, or resource limits across 15 deployments by hand is error-prone and slow. Python can do it in a loop.
If you've used the config comparison script from Day One, you already know that json.load() turns JSON into Python dicts. YAML works the same way — yaml.safe_load() turns a YAML file into Python dicts and lists you can read, modify, and write back.
Setup
PyYAML ships with most Python environments. If it's not installed:
Reading a YAML File
| Reading a YAML file | |
|---|---|
- Always
safe_load, neveryaml.load().yaml.load()can deserialize arbitrary Python objects and execute code.safe_loadonly deserializes basic types (dicts, lists, strings, numbers, booleans). This is not optional.
The result is a Python dict. Every YAML key becomes a dict key, every YAML list becomes a Python list, every YAML value becomes the appropriate Python type.
Modifying and Writing Back
- This is standard Python dict access — you're just navigating a nested structure. If the path doesn't exist, you get a
KeyError. Add.get()calls at each level if you're not sure the structure is consistent. default_flow_style=Falsewrites block-style YAML (the human-readable multi-line format) instead of inline JSON-like syntax. Almost always what you want.
The Round-Trip Problem
PyYAML does not preserve comments. If your YAML file has comments like # This controls the replica count, they're gone after a safe_load → dump cycle.
spec:
replicas: 3 # Adjust this for traffic spikes
template:
metadata:
labels:
app: myapp
The comment is lost. Key ordering may also change.
For scripts that modify files humans also edit, this is a problem. Two options:
| Round-trip YAML with ruamel.yaml | |
|---|---|
ruamel.yaml was built specifically for this use case. Use it when comment preservation matters.
| Standard PyYAML | |
|---|---|
Use PyYAML when the YAML is generated by your pipeline (not hand-edited), or when you're generating new files from scratch.
Multi-Document YAML
Kubernetes manifests often contain multiple resources in one file, separated by ---:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
---
apiVersion: v1
kind: Service
metadata:
name: myapp-service
safe_load() only reads the first document. Use safe_load_all() for multi-document files:
| Reading multi-document YAML | |
|---|---|
safe_load_all()returns a generator.list()materializes it so you can loop over it multiple times or index into it.
Writing multiple documents back:
| Writing multi-document YAML | |
|---|---|
Generating YAML Programmatically
Sometimes you need to create manifests from scratch — generating a ConfigMap for each environment, creating ServiceAccounts for a list of teams, or templating resources that Helm can't handle cleanly.
This is where Python beats Helm templates for simple generation tasks — no template language to learn, no {{ }} syntax, just Python dicts.
Handling Missing Keys Safely
Kubernetes manifests have optional fields. Navigating a path that might not exist raises KeyError:
Chaining .get({}) at each level returns an empty dict if the key is missing, letting the next .get() work without raising. Verbose but reliable.
Practice Exercises
Exercise 1: Update replica count across environments
You have staging.yaml and production.yaml, both Kubernetes Deployment manifests. Write a script that reads both, prints the current replica count for each, and updates staging to 2 replicas and production to 5 replicas.
Answer
Exercise 2: Extract all image names from a multi-document manifest
Given a multi-document YAML file containing several Deployments, print the name and image of every container across all Deployments.
Answer
Quick Recap
| Concept | What It Does |
|---|---|
yaml.safe_load(f) |
Parse YAML file → Python dict/list (always safe_load) |
yaml.dump(data, f, default_flow_style=False) |
Write dict/list → block-style YAML |
yaml.safe_load_all(f) |
Parse multi-document YAML → generator of dicts |
yaml.dump_all(docs, f) |
Write multiple dicts → multi-document YAML |
ruamel.yaml |
Round-trip YAML that preserves comments |
.get("key", {}) chained |
Navigate nested YAML without KeyError |
What's Next
The dict navigation patterns you've used here — nested key access, iteration, filtering — apply directly to JSON API responses. That's the next Essentials article, coming soon.
In the meantime, the config comparison script from Day One shows the same dict-diffing patterns applied to both JSON and YAML.
Further Reading
Official Documentation
- PyYAML documentation —
safe_load,dump,safe_load_all yaml.safe_loadvsyaml.load— Whysafe_loadis mandatory
Libraries
ruamel.yaml— Comment-preserving YAML for files humans also editPyYAML— The standard library for YAML in Python
Deep Dives
- YAML specification — When you need to understand why YAML is behaving unexpectedly (the Norway problem, booleans that aren't, etc.)
Exploring Kubernetes
- kubectl First Deploy — Applying the manifests you're building and modifying here with
kubectl apply -f
Exploring Computer Science
- How Parsers Work — What
yaml.safe_load()is actually doing under the hood when it converts text into Python data structures