Run This Everywhere
Part of Day One
This is part of Day One: Python for Platform Engineers.
The deploy finished. You need to confirm the service is healthy on all 15 app servers before you call it done. Or you need to check disk space on every node in the cluster before the storage migration. Or you need to verify the new config file landed on every host.
In bash, the loop is easy. The problem is what happens when one server fails, or times out, or is unreachable — and you need to know which one.
The Bash Loop and Its Gaps
| Bash: looping over servers | |
|---|---|
This runs the command on each server. What it doesn't do well:
- Collecting which servers passed and which failed
- Handling SSH timeouts without hanging
- Producing a summary you can act on
- Exiting with a useful code for a pipeline
You end up grepping through interleaved stdout, or writing to temp files, or losing track of which output came from which host.
The Python Version
StrictHostKeyChecking=noskips the "are you sure you want to connect?" prompt for new hosts. Necessary in automation; understand the security trade-off. For production tooling, use a known_hosts file instead.timeout=cmd_timeoutis the Python-level timeout on thesubprocess.run()call — if SSH hangs entirely (not just slow to connect), this catches it.
| Running it | |
|---|---|
Reading the Server List From a File
Hardcoding servers in the script is fine for one-off tasks. For anything you run regularly, keep the list in a file:
| Read server list from file | |
|---|---|
- Skip blank lines and lines starting with
#. This lets you comment out servers temporarily in the inventory file without editing the script.
Collecting Per-Server Output
Sometimes you need to collect the output from each server, not just pass/fail:
df --output=pcent /prints just the percentage used on therootfilesystem. Keeps the output simple to parse.
Bash can produce this output. Python lets you sort it, flag the worst offenders, and format it clearly — without awk and sort pipelines.
Adding Parallelism for Large Fleets
The sequential version waits for each server before moving to the next. For 15 servers that's fine. For 150, it's slow. Python's concurrent.futures runs checks in parallel without requiring you to manage threads yourself:
flowchart TD
A([check_fleet starts]) --> B[Submit all servers<br/>to thread pool]
B --> C[app-01]
B --> D[app-02]
B --> E[app-03]
C -->|✓ passed| F[Collect results<br/>as they complete]
D -->|✓ passed| F
E -->|✗ failed| F
F --> G([Print summary])
style A fill:#1a202c,stroke:#cbd5e0,stroke-width:2px,color:#fff
style B fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style C fill:#2f855a,stroke:#cbd5e0,stroke-width:2px,color:#fff
style D fill:#2f855a,stroke:#cbd5e0,stroke-width:2px,color:#fff
style E fill:#c53030,stroke:#cbd5e0,stroke-width:2px,color:#fff
style F fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style G fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
max_workers=10runs up to 10 SSH connections simultaneously. Set this based on your network and the load you're comfortable putting on your servers. Don't set it to 500.
The as_completed() loop prints results as they arrive rather than waiting for all of them — so you see fast servers immediately instead of staring at a blank screen.
bash lets you run the same command on every server. Python lets you know what happened on each.
Practice Exercises
Exercise 1: Add a dry-run flag
Modify the script to accept --dry-run as a command-line argument. In dry-run mode, print what it would do on each server without actually connecting.
Exercise 2: Write results to a file
Extend the script to write a timestamped summary to a file after running. Each line should be: timestamp,server,status.
Answer
| Write results to CSV | |
|---|---|
"a" mode appends to the file rather than overwriting it, so successive runs build a history.
Quick Recap
| Concept | What It Does |
|---|---|
subprocess.run([...], capture_output=True) |
Run a command and capture stdout/stderr |
timeout=N in subprocess.run |
Kill the process if it hangs longer than N seconds |
result.returncode |
0 = success, non-zero = failure |
List comprehension with if |
Filter blank lines from inventory file |
ThreadPoolExecutor |
Run checks in parallel |
What's Next
- My Bash Script Is Getting Out of Hand — When you need to run a more complex sequence of shell commands per server, not just one
Further Reading
Official Documentation
subprocessmodule — Full subprocess docsconcurrent.futures— Thread and process pool executors- OpenSSH client options —
ConnectTimeout,StrictHostKeyChecking, and other SSH options
Deep Dives
fabriclibrary — A higher-level library for SSH automation in Python, worth knowing once fleet operations become a regular part of your workparamiko— Direct SSH from Python without calling thesshbinary, useful when you need more control over the connection
Exploring Linux
- Processes —
systemctl, service states, and what your fleet checks are actually verifying at the OS level - Users and Groups — SSH access, user permissions, and why your automation might be refused on certain hosts