Is It Still Up?
Part of Day One
This is part of Day One: Python for Platform Engineers.
You're deploying. Traffic is cut over to the new pods, the old ones are terminating, and you need to know the moment your API is healthy again before you proceed. Your options: sit there hitting refresh, write a curl loop in bash, or do it properly.
This is where Python earns its place. Not because curl can't poll — it can. Because you need to know why the check failed, not just that it did.
The Bash Version and Its Problem
| Bash health poller | |
|---|---|
This works for interactive use. In a deployment pipeline, it has problems:
- No timeout — it'll run forever if the API never comes back
- No distinction between "connection refused" (server not started yet) and "HTTP 503" (server started, app not ready)
- Exit code is from
curl, not from your intent — harder to integrate with pipeline logic - No useful output about how long it waited
Here's what the Python version does at each step:
flowchart TD
A([Start polling]) --> B[Send HTTP GET]
B --> C{Response?}
C -->|HTTP 200| D([✓ Healthy — continue deploy])
C -->|HTTP 503| E[Print status, wait 5s]
C -->|Connection refused| E
C -->|Request timeout| E
E --> F{Elapsed ≥ timeout?}
F -->|No| B
F -->|Yes| G([✗ Did not recover — exit 1])
style A fill:#1a202c,stroke:#cbd5e0,stroke-width:2px,color:#fff
style B fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style C fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style D fill:#2f855a,stroke:#cbd5e0,stroke-width:2px,color:#fff
style E fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style F fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style G fill:#c53030,stroke:#cbd5e0,stroke-width:2px,color:#fff
The Python Version
timeout=3is the per-request timeout — how long to wait for a single HTTP response. Separate from the outertimeout=120loop timeout.- Connection refused means the server isn't listening yet. Different from an HTTP error — the application hasn't started.
- Server accepted the connection but didn't respond in time — usually means the app is starting but not ready.
sys.exit(1)signals failure to whatever called this script — your CI/CD pipeline, a Makefile, a parent script.
Running It
| Running the health check | |
|---|---|
The output tells you exactly what the server was doing during the wait. In a CI log, that's useful information. "It spent 10 seconds failing to connect, then 10 seconds returning 503s before coming healthy" is a different story than "it came up immediately."
Checking a JSON Field, Not Just Status Code
Your /health endpoint might return 200 with a body that indicates partial readiness:
A status-code-only check would pass this. Look at the body itself:
| Checking the response body | |
|---|---|
This snippet replaces the if resp.status_code == 200: block inside the wait_for_health() loop.
This is where Python genuinely beats a curl loop — parsing JSON inline without calling jq or juggling subshells.
Making It Reusable Across a Deploy Script
A health check that lives in a function can be called from a larger deployment script:
Bash functions exist, but sharing logic across files and integrating cleanly with exit codes gets awkward fast. The moment a health check needs to live inside a deploy pipeline — not a terminal — you've outgrown the curl loop.
Practice Exercises
Exercise 1: Add exponential backoff
The current poller waits exactly 5 seconds between each attempt. Modify it so the interval doubles after each failed attempt, up to a maximum of 30 seconds. (This reduces load on a recovering service while still catching a fast recovery.)
Exercise 2: Accept the URL as a command-line argument
Hardcoding the URL makes the script less reusable. Modify health_check.py so the URL is passed as the first argument: python health_check.py http://api.internal/health
Quick Recap
| Concept | What It Does |
|---|---|
requests.get(url, timeout=3) |
HTTP GET with per-request timeout |
ConnectionError |
Server isn't listening (process not started) |
Timeout |
Server accepted connection but didn't respond |
resp.status_code |
HTTP status (200, 503, etc.) |
resp.json() |
Parse response body as JSON |
sys.exit(1) |
Signal failure to calling process |
What's Next
- What Just Broke? — When the API came back but something still isn't right and you need to read the logs
Further Reading
Official Documentation
requestslibrary — The HTTP library used heretimemodule —time.sleep(),time.time()sys.exit()— Exit codes and pipeline integration
Exploring Kubernetes
- kubectl Commands — When health checking is part of a larger deploy:
kubectl rollout statusand related commands