Skip to content

The "Don't Do This" Guide

Part of Day One

This is part of Day One: Python for Platform Engineers.

Automation is force multiplication. That's the point. It's also the risk. A bash one-liner that has a bug affects one run. A Python script that loops over your fleet and has a bug affects every server it touches before you catch it.

These are the rules that keep automation bugs from becoming incidents.


The Golden Rule

Automation magnifies mistakes. Test on one before you run on all.

No exceptions. Before you run a new script against production:

  1. Run it with --dry-run to see what it would do
  2. Run it against one non-critical target
  3. Verify the result manually
  4. Then run it at scale

This sounds like it slows you down. It doesn't. Cleaning up a fleet-wide mistake takes far longer than testing on one server.


Don't Put Credentials in Your Script

❌ Never do this
db_password = "hunter2"
api_key = "sk-a8f3c..."

Scripts get committed to git. Git history is forever. Even if you delete the file, the credential is in the history. Even if the repo is private, it may not always be.

Your script should read from the environment. The environment should be populated from a secrets manager at runtime — not by typing the value into your shell:

✅ Retrieve from a secrets manager at runtime
1
2
3
4
5
6
7
8
# HashiCorp Vault
export DB_PASSWORD=$(vault kv get -field=password secret/myapp/db)

# AWS Secrets Manager
export DB_PASSWORD=$(aws secretsmanager get-secret-value \
  --secret-id myapp/db-password --query SecretString --output text)

python deploy.py

For the full pattern — reading from os.environ, fail-fast validation, .env files for local dev — see Environment Variables and Secrets.


Don't Use shell=True With Variables You Didn't Control

❌ Shell injection risk
namespace = input("Enter namespace: ")
subprocess.run(f"kubectl get pods -n {namespace}", shell=True)

If namespace is myapp; rm -rf /, that rm -rf / runs. Shell injection in automation tools is real.

✅ Always use list form
namespace = input("Enter namespace: ")
subprocess.run(["kubectl", "get", "pods", "-n", namespace])

In list form, each argument is passed directly to the process. No shell is involved. Shell metacharacters are treated as literal text.


Don't Ignore Return Codes

❌ Ignoring failure
1
2
3
subprocess.run(["kubectl", "apply", "-f", "manifests/"])
subprocess.run(["kubectl", "rollout", "status", "deployment/myapp"])
# If apply failed, rollout status will also fail — but you might not notice
✅ Check return codes
1
2
3
4
5
6
result = subprocess.run(
    ["kubectl", "apply", "-f", "manifests/"],
)
if result.returncode != 0:
    print("✗ Apply failed — stopping deploy")
    sys.exit(result.returncode)

Or use the run() wrapper from My Bash Script Is Getting Out of Hand which handles this for you.


Don't Print Credentials to stdout

❌ Leaking credentials to logs
print(f"Connecting to database with password: {db_password}")

CI/CD logs are stored, shared, and searchable. Log what you're connecting to — not the credential itself. This applies to passwords, API keys, tokens, and anything that came from a secrets store.


Always Build the --dry-run Flag First

Before you write the code that does the thing, write the code that prints what it would do:

Dry-run pattern
import click

@click.command()
@click.option("--dry-run", is_flag=True, help="Print actions without executing them")
def main(dry_run):
    for server in servers:
        if dry_run:
            print(f"[DRY RUN] Would restart myapp on {server}")
        else:
            restart_service(server, "myapp")

if __name__ == "__main__":
    main()

If you're not sure a script is correct, --dry-run is how you check. Make it the first thing you add to any automation script that modifies state.


Don't Swallow Exceptions Silently

❌ Silent exception
1
2
3
4
5
try:
    result = requests.get(url)
    data = result.json()
except Exception:
    pass  # Everything is fine (it isn't)

This hides failures. Your script continues as if nothing happened, and you debug for an hour trying to figure out why downstream steps produced wrong results.

✅ At minimum, log what failed
import sys
import requests

try:
    result = requests.get(url, timeout=5)
    result.raise_for_status()
    data = result.json()
except requests.exceptions.ConnectionError as e:
    print(f"✗ Could not connect to {url}: {e}")
    sys.exit(1)
except requests.exceptions.HTTPError as e:
    print(f"✗ {url} returned {e.response.status_code}")
    sys.exit(1)

Catch specific exceptions. Handle each one explicitly. If you can't recover, exit with a non-zero code and a useful message.


Quick Reference

Rule Why
No hardcoded credentials Git history is forever — use environment variables and a secrets manager
No shell=True with variables Shell injection is a real attack vector
Always check return codes Silent failures cascade into bigger failures
Never log credentials CI logs are stored and searchable
--dry-run before production Automation magnifies mistakes
Never swallow exceptions Silent failures are the hardest to debug

Practice Exercises

Exercise 1: Audit a script for security problems

Identify all the security problems in this script and describe how to fix each one.

audit_this.py — find the problems
import subprocess

API_KEY = "sk-9f3a21b..."
DB_URL = "postgres://admin:password123@db.prod.internal:5432/mydb"

def deploy(env):
    subprocess.run(
        f"kubectl set env deployment/myapp API_KEY={API_KEY} -n {env}",
        shell=True
    )
    print(f"Deployed to {env} with key: {API_KEY}")

deploy(input("Enter environment: "))
Answer

Five problems:

  1. Hardcoded API_KEY — use os.environ.get("API_KEY") instead
  2. Hardcoded DB_URL with credentials — use environment variables or a secrets manager
  3. shell=True with string interpolationAPI_KEY and env are injected into a shell string; injection risk if either contains shell metacharacters
  4. Credential printed to stdoutprint(f"...key: {API_KEY}") leaks the key to CI/CD logs
  5. env comes from input() — user-controlled string passed directly into a shell command
Exercise 2: Add dry-run to an existing function

Add --dry-run support to this script using click.

restart.py — add dry-run
1
2
3
4
5
6
7
8
import subprocess

def restart_on_server(server, service):
    subprocess.run(["ssh", server, "systemctl", "restart", service])

servers = ["web-01.prod", "web-02.prod", "web-03.prod"]
for s in servers:
    restart_on_server(s, "nginx")
Answer
restart.py — with dry-run
import subprocess
import click

def restart_on_server(server, service):
    subprocess.run(["ssh", server, "systemctl", "restart", service])

@click.command()
@click.option("--dry-run", is_flag=True, help="Print actions without executing them")
def main(dry_run):
    servers = ["web-01.prod", "web-02.prod", "web-03.prod"]
    for server in servers:
        if dry_run:
            print(f"[DRY RUN] Would restart nginx on {server}")
        else:
            restart_on_server(server, "nginx")

if __name__ == "__main__":
    main()

Further Reading

Security References

Tools

  • python-dotenv — Load environment variables from a .env file during development; never commit the .env file itself
  • click — Building CLI tools with --dry-run and other flags (covered in depth in the Efficiency section)

Exploring Linux

  • Linux Safety Guide — The same safety mindset applied to Linux commands: read before you write, understand before you run

What's Next

Day One gave you working scripts. Essentials makes them maintainable.

The gap between "it works on my machine" and "my team can run this in production" comes down to a few patterns you haven't needed yet: loading credentials cleanly without .env files scattered everywhere, reading and modifying the YAML that describes your infrastructure, handling failures in a way that gives you useful output instead of a traceback.

Start here: