Initial commit: _deploy_app skill

Deploy new apps or push updates to existing deployments via Docker Compose + Caddy + Gitea webhooks. Multi-server profiles, auto-detection of deployment status, full infrastructure provisioning. - SKILL.md: 715-line workflow documentation - scripts/detect_deployment.py: deployment status detection - scripts/validate_compose.py: compose file validation - references/: infrastructure, compose patterns, Caddy patterns - assets/: Makefile and compose templates - config.json: mew server profile
2026-03-25 21:12:30 -04:00
commit 994332a3f0
11 changed files with 3006 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,3 @@
 __pycache__/
 *.pyc
 .DS_Store
--- a/SKILL.md
+++ b/SKILL.md
@@ -0,0 +1,715 @@
 ---
 name: _deploy_app
 description: Deploy a new app or push updates to an existing deployment. Detects deployment status, provisions infrastructure (Gitea repo, DNS, Caddy, webhook), and deploys Docker Compose stacks. Supports multiple servers via profiles. Use when the user says "deploy this", "push to production", "deploy to mew", or invokes "/_deploy_app".
 ---
 # /_deploy_app
 Deploy a new application to a production server or push updates to an existing deployment. Detect deployment status automatically, provision all required infrastructure (Gitea repo, Cloudflare DNS, Caddy reverse proxy, webhook auto-deploy), and bring Docker Compose stacks online.
 ## When to Use
 - Deploy a new app to a production server for the first time
 - Push updates to an existing deployment (triggers auto-deploy via webhook)
 - Check deployment status of the current project
 - Trigger phrases: "deploy this", "push to production", "deploy to mew", `/_deploy_app`
 ## Prerequisites
 | Requirement | Details |
 |-------------|---------|
 | SSH access | `ssh mew` (or target server alias) must work without password prompt |
 | Gitea secrets | `~/.claude/secrets/gitea.json` with `token`, `url`, `owner`, `ssh_host` |
 | Cloudflare secrets | `~/.claude/secrets/cloudflare.json` with `token`, `zones` (domain-to-zone_id map) |
 | Docker Compose | Project must have `docker-compose.yaml` (or this skill helps generate one) |
 | Webhook secret | Stored in `~/.claude/secrets/gitea.json` as `webhook_secret` |
 If any secrets file is missing, stop and tell the user which file is needed and what keys it must contain.
 ## Workflow Overview
 ```mermaid
 flowchart TD
    A[📋 Load Server Profile] --> B[🔍 Detect Project]
    B --> C{🐍 Run detect_deployment.py}
    C -->|Not Deployed| D[🆕 New Deploy Workflow]
    C -->|Already Deployed| E[🔄 Redeploy Workflow]
    D --> D1[Ensure Compose + .env] --> D2[Create Gitea Repo]
    D2 --> D3[Git Init + Push] --> D4[Create DNS Record]
    D4 --> D5[Clone on Server] --> D6[Add to Deploy Map]
    D6 --> D7[Add Caddy Site Block] --> D8[Docker Compose Up]
    D8 --> D9[Add Webhook] --> F
    E --> E1[Commit Changes] --> E2[Git Push]
    E2 --> E3[Wait for Webhook] --> F
    F[✅ Verify & Report]
    style D fill:#c8e6c9
    style E fill:#bbdefb
    style F fill:#fff9c4
 ```
 ---
 ## Step 1: Load Server Profile
 Read `~/.claude/skills/_deploy_app/config.json` to determine the target server.
 ### If `config.json` exists
 1. Read the file and parse the `active_profile` key.
 2. If `--profile=name` was passed, use that profile instead. Error if it does not exist.
 3. Display a summary:
 > **🎯 Deployment target:** {profile.name}
 > **🌐 Domain:** *.{profile.domain}
 > **🖥️ Server:** {profile.ssh_host} ({profile.ssh_user}@{profile.ssh_host})
 > **🔀 Proxy:** {profile.proxy_type}
 > **📁 Deploy path:** {profile.deploy_path}
 4. Ask: "Proceed with this profile, or switch to another?"
 ### If `config.json` is missing
 Walk the user through creating their first profile:
 | # | Question | Default |
 |---|----------|---------|
 | 1 | Profile name (short ID, e.g. `mew`) | _(required)_ |
 | 2 | Description (human-readable) | _(required)_ |
 | 3 | SSH host alias (e.g. `mew`) | _(required)_ |
 | 4 | SSH user | `darren` |
 | 5 | Server hostname (for local detection) | _(required)_ |
 | 6 | Server IP (for DNS A records) | _(required)_ |
 | 7 | Wildcard domain (e.g. `lavender.spl.tech`) | _(required)_ |
 | 8 | Deploy path on server | `/srv/git` |
 | 9 | Proxy type (`caddy` or `none`) | `caddy` |
 | 10 | Caddy compose path _(skip if none)_ | `/data/docker/caddy` |
 | 11 | Caddy container name _(skip if none)_ | `caddy` |
 | 12 | Docker proxy network name | `proxy` |
 | 13 | Gitea host (e.g. `git.lavender-daydream.com`) | _(from secrets)_ |
 Write `config.json`, confirm, then proceed.
 ### Profile variables reference
 | Variable | Description | Example |
 |----------|-------------|---------|
 | `{profile.name}` | Human-readable name | `Mew Server` |
 | `{profile.ssh_host}` | SSH alias or hostname | `mew` |
 | `{profile.ssh_user}` | SSH login user | `darren` |
 | `{profile.server_hostname}` | Actual hostname (for local detection) | `mew` |
 | `{profile.server_ip}` | Public IP address | `155.94.170.136` |
 | `{profile.domain}` | Wildcard domain | `lavender.spl.tech` |
 | `{profile.deploy_path}` | Root path for deployed repos | `/srv/git` |
 | `{profile.proxy_type}` | `caddy` or `none` | `caddy` |
 | `{profile.caddy_compose_path}` | Caddy's docker-compose directory | `/data/docker/caddy` |
 | `{profile.caddy_container}` | Caddy container name | `caddy` |
 | `{profile.proxy_network}` | Docker network for proxy traffic | `proxy` |
 ### Execution context detection
 Determine whether commands run locally or remotely:
 ```bash
 current_host=$(hostname)
 ```
 - If `current_host` matches `{profile.server_hostname}` --> **local execution** (run commands directly)
 - If no match --> **remote execution** (wrap commands in `ssh {profile.ssh_user}@{profile.ssh_host} "command"`)
 Store this decision as `run_on_server` for use throughout the workflow.
 ---
 ## Step 2: Detect Project
 Scan the current working directory to gather project metadata.
 1. **Detect app name** -- use the directory basename, lowercased and hyphenated.
 2. **Check for git remote** -- if `origin` exists, extract the repo name from the URL.
 3. **Detect project type** -- look for these files (in order):
   - `docker-compose.yaml` / `docker-compose.yml` / `compose.yaml`
   - `Dockerfile`
   - `package.json`
   - `requirements.txt` / `pyproject.toml`
   - `go.mod`
 4. **Determine container port** -- parse the compose file for `ports:` mapping or `EXPOSE` in Dockerfile. Default to `3000` if not detectable.
 5. **Determine container name** -- from the compose file's main service, or `{app_name}` as fallback.
 ---
 ## Step 3: Check Deployment Status
 Run the detection script to determine if this app is already deployed:
 ```bash
 python3 ~/.claude/skills/_deploy_app/scripts/detect_deployment.py \
  --repo-name {owner}/{app_name} \
  --config ~/.claude/skills/_deploy_app/config.json
 ```
 Parse the JSON output:
 | Field | Meaning |
 |-------|---------|
 | `deployed` | `true` if the app exists on the server |
 | `gitea_repo_exists` | `true` if the Gitea repo exists |
 | `dns_exists` | `true` if the DNS record exists |
 | `caddy_configured` | `true` if Caddy has a site block |
 | `webhook_exists` | `true` if the Gitea webhook is configured |
 | `container_running` | `true` if the Docker container is up |
 **If `deployed` is true** --> go to [Step 4b: Redeploy Workflow](#step-4b-redeploy-workflow).
 **If `deployed` is false** --> go to [Step 4a: New Deploy Workflow](#step-4a-new-deploy-workflow).
 ---
 ## Step 4a: New Deploy Workflow
 Execute all substeps in order. Each substep is idempotent -- skip if the resource already exists.
 ### 4a.1: Confirm Domain
 Propose a default domain and ask the user to confirm or override:
 > **Proposed domain:** `{app_name}.{profile.domain}`
 > Accept this domain, or provide a different one?
 Store the confirmed domain as `{domain}`.
 ### 4a.2: Ensure docker-compose.yaml
 If no compose file exists in the project:
 1. Read `~/.claude/skills/_compose/references/proxy-patterns.md` for Caddy patterns.
 2. Generate a compose file appropriate for the detected project type.
 3. **MUST** include the proxy network as an external network:
 ```yaml
 networks:
  proxy:
    external: true
 services:
  {app_name}:
    # ... service config ...
    networks:
      - proxy
      - default
 ```
 4. Pin all image versions -- never use `latest`.
 5. Set `restart: unless-stopped` on all services.
 ### 4a.3: Ensure .env
 If `.env` does not exist:
 1. Generate with real random secrets:
   ```bash
   openssl rand -hex 16
   ```
 2. Include all required environment variables with sensible defaults.
 3. Group by section with comments (e.g. `# === Database ===`).
 ### 4a.4: Ensure .env.example
 Copy `.env` and replace all secret values with descriptive placeholders:
 ```
 DB_PASSWORD=changeme-use-a-strong-password
 SECRET_KEY=generate-with-openssl-rand-hex-32
 ```
 ### 4a.5: Ensure Makefile
 Copy from `~/.claude/skills/_deploy_app/assets/Makefile.template` if it exists, otherwise generate:
 ```makefile
 .PHONY: up down logs pull restart ps
 up:
 	docker compose up -d
 down:
 	docker compose down
 logs:
 	docker compose logs -f
 pull:
 	docker compose pull
 restart:
 	docker compose restart
 ps:
 	docker compose ps
 ```
 ### 4a.6: Ensure .gitignore
 At minimum, include:
 ```
 .env
 *.log
 ```
 **⚠️ IMPORTANT:** Do NOT modify an existing `.gitignore` without explicit user permission (per global guardrails).
 ### 4a.7: Validate Compose File
 Run the compose validation script:
 ```bash
 python3 ~/.claude/skills/_compose/scripts/validate-compose.py \
  ./docker-compose.yaml --strict
 ```
 Fix all errors before proceeding. Review warnings and fix where appropriate.
 ### 4a.8: Create Gitea Repository
 Read the Gitea token from secrets and create the repo:
 ```bash
 GITEA_TOKEN=$(python3 -c "import json; print(json.load(open('$HOME/.claude/secrets/gitea.json'))['token'])")
 GITEA_URL=$(python3 -c "import json; print(json.load(open('$HOME/.claude/secrets/gitea.json'))['url'])")
 GITEA_OWNER=$(python3 -c "import json; print(json.load(open('$HOME/.claude/secrets/gitea.json'))['owner'])")
 curl -s -X POST -H "Authorization: token $GITEA_TOKEN" \
  "$GITEA_URL/api/v1/user/repos" \
  -H "Content-Type: application/json" \
  -d '{"name": "{app_name}", "private": false, "auto_init": false}'
 ```
 If the repo already exists (HTTP 409), skip this step.
 ### 4a.9: Git Init and Push
 ```bash
 git init -b main
 git add -A
 git commit -m "Initial commit"
 git remote add origin git@{gitea_ssh_host}:{owner}/{app_name}.git
 git push -u origin main
 ```
 If git is already initialized, add the remote (if missing) and push.
 ### 4a.10: Create Cloudflare DNS Record
 Read the Cloudflare token and zone ID, then create an A record:
 ```bash
 CF_TOKEN=$(python3 -c "import json; print(json.load(open('$HOME/.claude/secrets/cloudflare.json'))['token'])")
 ```
 Determine the zone ID by matching the domain's root against the `zones` map in `cloudflare.json`.
 **Check if record already exists:**
 ```bash
 curl -s -H "Authorization: Bearer $CF_TOKEN" \
  "https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records?type=A&name={domain}"
 ```
 **If no record exists, create one:**
 ```bash
 curl -s -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records" \
  -H "Authorization: Bearer $CF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"type":"A","name":"{domain}","content":"{profile.server_ip}","ttl":1,"proxied":false}'
 ```
 - `proxied: false` -- Caddy handles TLS via Let's Encrypt; Cloudflare proxy would interfere.
 - `ttl: 1` -- automatic TTL.
 **If record exists but points to the wrong IP**, update it with PUT.
 ### 4a.11: Clone Repository on Server
 Use the `run_on_server` helper:
 ```bash
 # On server:
 cd {profile.deploy_path} && git clone git@{gitea_ssh_host}:{owner}/{app_name}.git
 ```
 If the directory already exists, pull instead:
 ```bash
 # On server:
 cd {profile.deploy_path}/{app_name} && git pull origin main
 ```
 ### 4a.12: Add to Deploy Map
 Read the current deploy map, add the new entry, and write back:
 ```bash
 # On server:
 jq '. + {"{owner}/{app_name}": "{profile.deploy_path}/{app_name}"}' \
  /etc/deploy-listener/deploy-map.json \
  | sudo tee /etc/deploy-listener/deploy-map.json.tmp \
  && sudo mv /etc/deploy-listener/deploy-map.json.tmp /etc/deploy-listener/deploy-map.json
 ```
 If `/etc/deploy-listener/deploy-map.json` does not exist, create it with just this entry.
 ### 4a.13: Add Caddy Site Block
 **⚠️ Skip this step if `{profile.proxy_type}` is `none`.** Note to the user: "Reverse proxy is set to `none` -- configure your own proxy to point to this stack."
 Append a new site block to the Caddyfile on the server:
 ```
 # === {App Name} ===
 {domain} {
    encode zstd gzip
    reverse_proxy {container_name}:{port}
 }
 ```
 Where `{container_name}` is the main app container and `{port}` is its internal port.
 **Caddyfile location:** `{profile.caddy_compose_path}/Caddyfile`
 After appending, restart Caddy:
 ```bash
 # On server:
 cd {profile.caddy_compose_path} && docker compose restart {profile.caddy_container}
 ```
 ### 4a.14: Deploy the Stack
 On the server, bring up the Docker Compose stack:
 ```bash
 # On server:
 cd {profile.deploy_path}/{app_name}
 ```
 **If a Dockerfile exists** (custom build):
 ```bash
 docker compose up -d --build
 ```
 **If only pre-built images** (no Dockerfile):
 ```bash
 docker compose pull && docker compose up -d
 ```
 ### 4a.15: Add Gitea Webhook
 Create a push webhook so future `git push` events trigger auto-deploy:
 ```bash
 WEBHOOK_SECRET=$(python3 -c "import json; print(json.load(open('$HOME/.claude/secrets/gitea.json'))['webhook_secret'])")
 curl -s -X POST -H "Authorization: token $GITEA_TOKEN" \
  "$GITEA_URL/api/v1/repos/{owner}/{app_name}/hooks" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "gitea",
    "active": true,
    "branch_filter": "main master",
    "config": {
      "url": "https://deploy.{profile.domain}/webhook",
      "content_type": "json",
      "secret": "'"$WEBHOOK_SECRET"'"
    },
    "events": ["push"]
  }'
 ```
 ### 4a.16: Verify Deployment
 Wait 10 seconds for TLS certificate provisioning, then verify:
 **HTTP check:**
 ```bash
 curl -s -o /dev/null -w "%{http_code}" https://{domain}/
 ```
 | Code | Meaning |
 |------|---------|
 | `200` | ✅ Deployment verified |
 | `301`/`302` | ✅ Redirect -- likely working (SSL or app redirect) |
 | `502` | ❌ Caddy cannot reach container -- check proxy network and container status |
 | `0` / timeout | ❌ DNS not propagated or Caddy not restarted |
 **Container check:**
 ```bash
 # On server:
 docker ps --filter name={container_name} --format '{{.Status}}'
 ```
 ### 4a.17: Report
 Display a deployment summary:
 > **✅ Deployment complete!**
 >
 > | Item | Status |
 > |------|--------|
 > | 🌐 URL | https://{domain} |
 > | 🐳 Container | {status from docker ps} |
 > | 📦 Gitea repo | {gitea_url}/{owner}/{app_name} |
 > | 🪝 Webhook | Active (auto-deploy on push to main) |
 > | 📡 DNS | {domain} → {profile.server_ip} |
 ---
 ## Step 4b: Redeploy Workflow
 For apps that are already deployed, the workflow is simplified: commit, push, and let the webhook handle the rest.
 ### 4b.1: Check for Uncommitted Changes
 ```bash
 git status --porcelain
 ```
 If output is non-empty, display the changes and ask:
 > "There are uncommitted changes. Commit and deploy, or abort?"
 ### 4b.2: Commit Changes
 If the user confirms:
 ```bash
 git add -A
 git commit -m "{descriptive message based on changed files}"
 ```
 ### 4b.3: Push to Remote
 ```bash
 git push origin main
 ```
 If no remote named `origin` exists pointing to the Gitea host, add it first:
 ```bash
 git remote add origin git@{gitea_ssh_host}:{owner}/{app_name}.git
 git push -u origin main
 ```
 ### 4b.4: Wait for Webhook
 Wait 5 seconds for the webhook to fire and the deploy listener to process:
 ```bash
 sleep 5
 ```
 ### 4b.5: Verify
 Check the deploy listener logs on the server:
 ```bash
 # On server:
 journalctl -u deploy-listener -n 10 --no-pager
 ```
 Curl the live domain:
 ```bash
 curl -s -o /dev/null -w "%{http_code}" https://{domain}/
 ```
 Check container status:
 ```bash
 # On server:
 docker ps --filter name={container_name} --format '{{.Status}}'
 ```
 ### 4b.6: Report
 > **🔄 Redeployment complete!**
 > **🌐 URL:** https://{domain}
 > **🐳 Container:** {status}
 > **🪝 Triggered via:** webhook (push to main)
 ---
 ## Helper: run_on_server
 All server-side commands use this pattern for execution context:
 ```python
 def run_on_server(command):
    if is_local:
        # hostname matches profile.server_hostname
        run(command)
    else:
        # wrap in SSH
        run(f'ssh {profile.ssh_user}@{profile.ssh_host} "{command}"')
 ```
 When constructing SSH commands:
 - Escape double quotes inside the command.
 - For multi-line commands, use `ssh ... 'bash -s' << 'EOF'` heredoc syntax.
 - For commands requiring sudo, ensure the SSH user has passwordless sudo configured for the needed commands.
 ---
 ## Configuration
 ### config.json Structure
 ```json
 {
  "active_profile": "mew",
  "profiles": {
    "mew": {
      "name": "Mew Server",
      "ssh_host": "mew",
      "ssh_user": "darren",
      "server_hostname": "mew",
      "server_ip": "155.94.170.136",
      "domain": "lavender.spl.tech",
      "deploy_path": "/srv/git",
      "proxy_type": "caddy",
      "caddy_compose_path": "/data/docker/caddy",
      "caddy_container": "caddy",
      "proxy_network": "proxy"
    }
  }
 }
 ```
 ### Secrets Files
 **`~/.claude/secrets/gitea.json`:**
 ```json
 {
  "token": "gitea-api-token",
  "url": "https://git.lavender-daydream.com",
  "ssh_host": "git.lavender-daydream.com",
  "owner": "darren",
  "webhook_secret": "shared-secret-for-webhooks"
 }
 ```
 **`~/.claude/secrets/cloudflare.json`:**
 ```json
 {
  "token": "cloudflare-api-bearer-token",
  "zones": {
    "lavender.spl.tech": "zone-id-here",
    "spl.tech": "zone-id-here"
  }
 }
 ```
 ---
 ## Adding a New Server Profile
 Follow these steps to add a second (or third) deployment target.
 ### 1. Prepare the Server
 On the new server:
 1. Install Docker and Docker Compose.
 2. Set up the deploy listener (`deploy-listener.py`) as a systemd service.
 3. Create the deploy map: `sudo mkdir -p /etc/deploy-listener && echo '{}' | sudo tee /etc/deploy-listener/deploy-map.json`
 4. Set up Caddy (or chosen reverse proxy) with Docker Compose.
 5. Create the deploy path: `sudo mkdir -p /srv/git`
 6. Ensure SSH key access from the workstation (`ssh new-server` must work).
 7. Ensure the server can clone from Gitea (add SSH key to Gitea if needed).
 ### 2. Add the Profile
 Edit `~/.claude/skills/_deploy_app/config.json` and add a new entry under `profiles`:
 ```json
 {
  "active_profile": "mew",
  "profiles": {
    "mew": { "...existing..." },
    "new-server": {
      "name": "New Server Description",
      "ssh_host": "new-server",
      "ssh_user": "darren",
      "server_hostname": "new-server",
      "server_ip": "1.2.3.4",
      "domain": "new.example.com",
      "deploy_path": "/srv/git",
      "proxy_type": "caddy",
      "caddy_compose_path": "/data/docker/caddy",
      "caddy_container": "caddy",
      "proxy_network": "proxy"
    }
  }
 }
 ```
 ### 3. Deploy to the New Server
 Use the `--profile` flag:
 ```
 /_deploy_app --profile=new-server
 ```
 Or set `active_profile` to the new server name in `config.json`.
 ---
 ## Troubleshooting
 | Problem | Cause | Fix |
 |---------|-------|-----|
 | **Caddy 502 Bad Gateway** | Container not on the `proxy` network, or container not started | Verify: `docker network inspect proxy` -- check the app container is listed. Run `docker network connect proxy {container_name}` if missing. |
 | **Caddy 502 after restart** | Caddy restarted before container was ready | Wait for container healthcheck, then restart Caddy: `docker compose restart caddy` |
 | **Webhook not firing** | Webhook misconfigured or deploy listener down | Check Gitea webhook delivery history: Gitea UI → Repo → Settings → Webhooks → Recent Deliveries. Check deploy listener: `systemctl status deploy-listener` |
 | **DNS not resolving** | Cloudflare propagation delay or wrong zone | Verify with `dig {domain}`. Check Cloudflare dashboard. Propagation is usually instant but can take up to 5 minutes. |
 | **Git push rejected** | Remote URL incorrect or SSH key not authorized | Verify remote: `git remote -v`. Test SSH: `ssh -T git@{gitea_ssh_host}`. Check Gitea deploy keys. |
 | **Deploy listener not running** | Service crashed or not enabled | Check: `systemctl status deploy-listener`. Restart: `sudo systemctl restart deploy-listener`. Enable: `sudo systemctl enable deploy-listener`. |
 | **Container exits immediately** | Missing .env, bad config, or port conflict | Check logs: `docker compose logs {service}`. Verify `.env` exists on server. Check port conflicts: `ss -tlnp \| grep {port}`. |
 | **TLS cert not provisioned** | DNS not pointed, or rate limited | Caddy auto-provisions via Let's Encrypt. Verify DNS resolves first. Check Caddy logs: `docker compose logs caddy`. Let's Encrypt rate limits: 50 certs per domain per week. |
 | **Permission denied on server** | SSH user lacks sudo or file ownership wrong | Verify user is in the `git` group: `groups darren`. Check file ownership: `ls -la {deploy_path}/{app_name}`. |
 ---
 ## Important Rules
 - Always load and confirm the server profile before doing anything else.
 - Always run `detect_deployment.py` before choosing the new-deploy or redeploy path.
 - Never start Docker containers without explicit user confirmation on first deploy.
 - Always create real random secrets in `.env` -- never use placeholder passwords.
 - Always pin Docker image versions -- never use `latest`.
 - Always include the proxy network in generated compose files.
 - Always verify the deployment with both an HTTP check and a container status check.
 - Never modify `.gitignore` without explicit user permission.
 - Check all git remotes for public providers (github.com, gitlab.com, etc.) before pushing -- warn the user if found.
 - If `config.json` is modified, write it back immediately.
 - Prefer the `_compose` skill's `validate-compose.py` script for compose file validation.
 ## Resources
 ### scripts/
 - **`detect_deployment.py`** -- Check Gitea API, Cloudflare DNS, server filesystem, and Docker status to determine if an app is already deployed. Return structured JSON. _(To be created.)_
 - **`validate_compose.py`** -- Delegate to `~/.claude/skills/_compose/scripts/validate-compose.py`.
 ### references/
 - **`compose-patterns.md`** -- Common Docker Compose patterns for different app types (Node.js, Python, Go, static sites). _(To be created.)_
 - Reuse `~/.claude/skills/_compose/references/proxy-patterns.md` for proxy configuration guidance.
 - Reuse `~/.claude/skills/_compose/references/troubleshooting.md` for Docker troubleshooting.
 ### assets/
 - **`Makefile.template`** -- Standard Makefile for deployed apps. _(To be created.)_
 ## Cross-Platform Notes
 - All API calls use `curl`, available on both Windows and Linux.
 - Python scripts use `#!/usr/bin/env python3` for portability.
 - SSH commands work from both WSL and native Linux.
 - Use `$HOME` (not `~`) in scripts for compatibility.
 - Path separators: always forward slashes.
--- a/assets/Makefile.template
+++ b/assets/Makefile.template
@@ -0,0 +1,22 @@
 .PHONY: up down logs pull restart ps build
 up:
 	docker compose up -d
 down:
 	docker compose down
 logs:
 	docker compose logs -f
 pull:
 	docker compose pull
 restart:
 	docker compose restart
 ps:
 	docker compose ps
 build:
 	docker compose up -d --build
--- a/assets/docker-compose.template.yaml
+++ b/assets/docker-compose.template.yaml
@@ -0,0 +1,21 @@
 # Template: Replace {app_name}, {image}, and {port} with actual values
 services:
  {app_name}:
    image: {image}
    container_name: {app_name}
    restart: unless-stopped
    # Uncomment if building from Dockerfile:
    # build: .
    env_file:
      - .env
    networks:
      - proxy
      - default
    # Uncomment and set the internal port:
    # expose:
    #   - "{port}"
 networks:
  proxy:
    name: proxy
    external: true
--- a/config.example.json
+++ b/config.example.json
@@ -0,0 +1,42 @@
 {
  "active_profile": "my-server",
  "profiles": {
    "my-server": {
      "server": {
        "name": "my-server",
        "ip": "1.2.3.4",
        "ssh_user": "deploy-user",
        "ssh_host": "my-server",
        "deploy_map": "/etc/deploy-listener/deploy-map.json",
        "deploy_env": "/etc/deploy-listener/deploy-listener.env",
        "caddyfile": "/data/docker/caddy/Caddyfile",
        "caddy_container": "caddy",
        "caddy_compose_dir": "/data/docker/caddy",
        "compose_dir": "/srv/git",
        "proxy_network": "proxy",
        "proxy_gateway": "10.0.12.1"
      },
      "gitea": {
        "external_url": "https://git.example.com",
        "internal_url": "http://gitea-container-ip:3000",
        "api_path": "/api/v1",
        "default_owner": "your-username",
        "ssh_host": "git.example.com",
        "ssh_port": 2222
      },
      "webhook": {
        "url": "https://deploy.example.com/webhook",
        "events": ["push"],
        "branch_filter": "main master"
      },
      "domains": {
        "default_pattern": "{app}.example.com",
        "available": ["example.com"]
      },
      "secrets": {
        "gitea_token": "~/.claude/secrets/gitea.json",
        "cloudflare_token": "~/.claude/secrets/cloudflare.json"
      }
    }
  }
 }
--- a/config.json
+++ b/config.json
@@ -0,0 +1,42 @@
 {
  "active_profile": "mew",
  "profiles": {
    "mew": {
      "server": {
        "name": "mew",
        "ip": "155.94.170.136",
        "ssh_user": "darren",
        "ssh_host": "mew",
        "deploy_map": "/etc/deploy-listener/deploy-map.json",
        "deploy_env": "/etc/deploy-listener/deploy-listener.env",
        "caddyfile": "/data/docker/caddy/Caddyfile",
        "caddy_container": "caddy",
        "caddy_compose_dir": "/data/docker/caddy",
        "compose_dir": "/srv/git",
        "proxy_network": "proxy",
        "proxy_gateway": "10.0.12.1"
      },
      "gitea": {
        "external_url": "https://git.lavender-daydream.com",
        "internal_url": "http://10.0.12.5:3000",
        "api_path": "/api/v1",
        "default_owner": "darren",
        "ssh_host": "git.lavender-daydream.com",
        "ssh_port": 2222
      },
      "webhook": {
        "url": "https://deploy.lavender.spl.tech/webhook",
        "events": ["push"],
        "branch_filter": "main master"
      },
      "domains": {
        "default_pattern": "{app}.lavender.spl.tech",
        "available": ["lavender-daydream.com", "spl.tech"]
      },
      "secrets": {
        "gitea_token": "~/.claude/secrets/gitea.json",
        "cloudflare_token": "~/.claude/secrets/cloudflare.json"
      }
    }
  }
 }
--- a/references/caddy-patterns.md
+++ b/references/caddy-patterns.md
@@ -0,0 +1,252 @@
 # Caddyfile Patterns Reference
 Reusable Caddyfile site block patterns for the mew server. All blocks go in `/data/docker/caddy/Caddyfile`. After editing, reload or restart Caddy (see infrastructure.md for details).
 ---
 ## 1. Standard Reverse Proxy
 The most common pattern. Terminate TLS, compress responses, and forward to a container.
 ```
 # === My App ===
 myapp.lavender-daydream.com {
    encode zstd gzip
    reverse_proxy myapp:3000
 }
 ```
 ### Breakdown
 - **Domain line**: Caddy automatically provisions a Let's Encrypt certificate for this domain.
 - **`encode zstd gzip`**: Compress responses with zstd (preferred) or gzip (fallback). Include this in every site block.
 - **`reverse_proxy myapp:3000`**: Forward requests to the container named `myapp` on port 3000. Caddy resolves the container name via the shared `proxy` Docker network.
 ### Prerequisites
 - DNS A record pointing the domain to `155.94.170.136`.
 - The target container is running and joined to the `proxy` network.
 - The container name and port match what is specified in the `reverse_proxy` directive.
 ---
 ## 2. WebSocket Support
 For applications that use WebSocket connections (chat apps, real-time dashboards, collaborative editors, etc.).
 ```
 # === Real-time App ===
 realtime.lavender-daydream.com {
    encode zstd gzip
    reverse_proxy realtime-app:3000 {
        header_up X-Real-IP {remote_host}
        header_up X-Forwarded-For {remote_host}
        header_up X-Forwarded-Proto {scheme}
    }
 }
 ```
 ### Notes
 - Caddy 2 handles WebSocket upgrades transparently. There is no special `websocket` directive needed — `reverse_proxy` detects the `Upgrade: websocket` header and handles the protocol switch automatically.
 - The `header_up` directives forward the real client IP and protocol to the backend, which is important for applications that log connections or enforce security based on client IP.
 - If the application uses a non-standard WebSocket path (e.g., `/ws` or `/socket.io`), this pattern still works without changes — Caddy proxies all paths by default.
 ---
 ## 3. Multiple Domains
 Serve the same application from multiple domains (e.g., bare domain and `www` subdomain, or a vanity domain alongside the primary).
 ```
 # === My App (multi-domain) ===
 myapp.lavender-daydream.com, www.myapp.lavender-daydream.com {
    encode zstd gzip
    reverse_proxy myapp:3000
 }
 ```
 ### With Redirect
 Redirect one domain to the canonical domain instead of serving from both:
 ```
 # === My App (canonical redirect) ===
 www.myapp.lavender-daydream.com {
    redir https://myapp.lavender-daydream.com{uri} permanent
 }
 myapp.lavender-daydream.com {
    encode zstd gzip
    reverse_proxy myapp:3000
 }
 ```
 ### Notes
 - Caddy provisions separate TLS certificates for each domain listed.
 - Ensure DNS A records exist for every domain in the site block.
 - Use `permanent` (301) redirects for SEO-friendly canonical domain enforcement.
 - The `{uri}` placeholder preserves the request path and query string during the redirect.
 ---
 ## 4. HTTPS Upstream
 For services that speak HTTPS internally (e.g., Cockpit, some management UIs). Caddy must be told to connect to the upstream over TLS.
 ```
 # === Cockpit ===
 cockpit.lavender-daydream.com {
    encode zstd gzip
    reverse_proxy https://cockpit:9090 {
        transport http {
            tls_insecure_skip_verify
        }
    }
 }
 ```
 ### Notes
 - Prefix the upstream address with `https://` to instruct Caddy to connect over TLS.
 - `tls_insecure_skip_verify` disables certificate verification for the upstream connection. Use this when the upstream uses a self-signed certificate, which is common for management interfaces like Cockpit.
 - Do NOT use `tls_insecure_skip_verify` if the upstream has a valid, trusted certificate — remove the entire `transport` block in that case.
 - This pattern is uncommon. Most containers speak plain HTTP internally, and Caddy handles TLS termination on the frontend only.
 ---
 ## 5. Rate Limiting
 Protect sensitive endpoints (login forms, APIs, webhooks) from abuse with rate limiting.
 ```
 # === Rate-Limited App ===
 myapp.lavender-daydream.com {
    encode zstd gzip
    # Rate limit login endpoint: 10 requests per minute per IP
    @login {
        path /api/auth/login
    }
    rate_limit @login {
        zone login_zone {
            key {remote_host}
            events 10
            window 1m
        }
    }
    # Rate limit API endpoints: 60 requests per minute per IP
    @api {
        path /api/*
    }
    rate_limit @api {
        zone api_zone {
            key {remote_host}
            events 60
            window 1m
        }
    }
    reverse_proxy myapp:3000
 }
 ```
 ### Notes
 - Rate limiting requires the `caddy-ratelimit` plugin. Verify it is included in the Caddy build before using these directives. If it is not available, implement rate limiting at the application level instead.
 - The `@name` syntax defines a named matcher that scopes the rate limit to specific paths.
 - `key {remote_host}` rate-limits per client IP address.
 - `events` is the maximum number of requests allowed within the `window` period.
 - Clients that exceed the limit receive a `429 Too Many Requests` response.
 - Apply stricter limits to authentication endpoints and more generous limits to general API usage.
 ### Alternative: Application-Level Rate Limiting
 If the Caddy rate-limit plugin is not installed, skip the `rate_limit` directives and use the standard reverse proxy pattern. Configure rate limiting within the application instead (e.g., `express-rate-limit` for Node.js, `slowapi` for FastAPI).
 ---
 ## 6. Path-Based Routing
 Route different URL paths to different backend services. Common for monorepo deployments where `/api` goes to a backend service and `/` goes to a frontend.
 ```
 # === Full-Stack App (path-based) ===
 myapp.lavender-daydream.com {
    encode zstd gzip
    # API requests → backend container
    handle /api/* {
        reverse_proxy myapp-api:8000
    }
    # WebSocket endpoint → backend container
    handle /ws/* {
        reverse_proxy myapp-api:8000
    }
    # Everything else → frontend container
    handle {
        reverse_proxy myapp-frontend:80
    }
 }
 ```
 ### Notes
 - `handle` blocks are evaluated in the order they appear. More specific paths must come before the catch-all.
 - The final `handle` (with no path argument) is the catch-all — it matches everything not matched above.
 - Use `handle_path` instead of `handle` if you need to strip the path prefix before forwarding. For example:
  ```
  handle_path /api/* {
      reverse_proxy myapp-api:8000
  }
  ```
  This strips `/api` from the request path, so `/api/users` becomes `/users` when it reaches the backend. Only use this if the backend does not expect the `/api` prefix.
 - Ensure all referenced containers (`myapp-api`, `myapp-frontend`) are on the `proxy` network.
 ### Variation: Static Files + API
 Serve static files directly from Caddy for the frontend, with API requests proxied to a backend:
 ```
 # === Static Frontend + API Backend ===
 myapp.lavender-daydream.com {
    encode zstd gzip
    handle /api/* {
        reverse_proxy myapp-api:8000
    }
    handle {
        root * /srv/myapp/dist
        try_files {path} /index.html
        file_server
    }
 }
 ```
 This requires the static files to be accessible from within the Caddy container (via a volume mount).
 ---
 ## Universal Conventions
 Apply these conventions to every site block:
 1. **Comment header**: Place `# === App Name ===` above each site block.
 2. **Compression**: Always include `encode zstd gzip` as the first directive.
 3. **Container names**: Use container names, not IP addresses, in `reverse_proxy`.
 4. **One domain per block** unless intentionally serving multiple domains (pattern 3).
 5. **Order matters**: Place more specific `handle` blocks before less specific ones.
 6. **Test after changes**: After modifying the Caddyfile, reload Caddy and verify the site responds:
   ```bash
   docker exec caddy caddy reload --config /etc/caddy/Caddyfile
   curl -I https://myapp.lavender-daydream.com
   ```
   If reload fails, check Caddy logs:
   ```bash
   docker logs caddy --tail 50
   ```
--- a/references/compose-patterns.md
+++ b/references/compose-patterns.md
@@ -0,0 +1,351 @@
 # Docker Compose Patterns Reference
 Reusable `docker-compose.yaml` templates for common application types deployed on mew. Every template includes the external `proxy` network required for Caddy reverse proxying.
 ---
 ## 1. Node.js / Express with Dockerfile Build
 Build a Node.js app from a local Dockerfile. The container exposes an internal port that Caddy proxies to.
 ```yaml
 version: "3.8"
 services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: myapp
    restart: unless-stopped
    expose:
      - "3000"
    environment:
      - NODE_ENV=production
      - PORT=3000
    env_file:
      - .env
    networks:
      - proxy
 networks:
  proxy:
    name: proxy
    external: true
 ```
 ### Companion Dockerfile
 ```dockerfile
 FROM node:20-alpine
 WORKDIR /app
 COPY package*.json ./
 RUN npm ci --omit=dev
 COPY . .
 EXPOSE 3000
 CMD ["node", "server.js"]
 ```
 ### Notes
 - Use `expose` (not `ports`) to keep the port internal to Docker networks only.
 - Set `container_name` to a unique, descriptive name — Caddy uses this name in its `reverse_proxy` directive.
 - The app listens on port 3000 inside the container. Caddy reaches it via `myapp:3000`.
 ---
 ## 2. Python / FastAPI with Dockerfile Build
 Build a Python FastAPI app from a local Dockerfile. Uses Uvicorn as the ASGI server.
 ```yaml
 version: "3.8"
 services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: myapi
    restart: unless-stopped
    expose:
      - "8000"
    environment:
      - PYTHONUNBUFFERED=1
    env_file:
      - .env
    networks:
      - proxy
 networks:
  proxy:
    name: proxy
    external: true
 ```
 ### Companion Dockerfile
 ```dockerfile
 FROM python:3.12-slim
 WORKDIR /app
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 COPY . .
 EXPOSE 8000
 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
 ```
 ### Notes
 - `PYTHONUNBUFFERED=1` ensures log output appears immediately in `docker compose logs`.
 - For production, consider adding `--workers 4` to the Uvicorn command or switching to Gunicorn with Uvicorn workers.
 - Caddy reaches this via `myapi:8000`.
 ---
 ## 3. Static Site (nginx)
 Serve pre-built static files (HTML, CSS, JS) via nginx.
 ```yaml
 version: "3.8"
 services:
  app:
    image: nginx:alpine
    container_name: mysite
    restart: unless-stopped
    expose:
      - "80"
    volumes:
      - ./dist:/usr/share/nginx/html:ro
      - ./nginx.conf:/etc/nginx/conf.d/default.conf:ro
    networks:
      - proxy
 networks:
  proxy:
    name: proxy
    external: true
 ```
 ### Companion nginx.conf
 ```nginx
 server {
    listen 80;
    server_name _;
    root /usr/share/nginx/html;
    index index.html;
    location / {
        try_files $uri $uri/ /index.html;
    }
    # Cache static assets
    location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff2?)$ {
        expires 30d;
        add_header Cache-Control "public, immutable";
    }
 }
 ```
 ### Notes
 - Mount the build output directory (e.g., `./dist`) into the nginx html root.
 - The `try_files` fallback to `/index.html` supports client-side routing (React Router, Vue Router, etc.).
 - Mount the nginx config as read-only (`:ro`).
 - Caddy reaches this via `mysite:80`.
 ---
 ## 4. Pre-built Image Only
 Pull and run a published Docker image with no local build. Suitable for off-the-shelf applications like wikis, dashboards, and link pages.
 ```yaml
 version: "3.8"
 services:
  app:
    image: lscr.io/linuxserver/bookstack:latest
    container_name: bookstack
    restart: unless-stopped
    expose:
      - "6875"
    env_file:
      - .env
    volumes:
      - ./data:/config
    networks:
      - proxy
 networks:
  proxy:
    name: proxy
    external: true
 ```
 ### Notes
 - Replace the `image` and `expose` port with whatever the application requires.
 - Check the image documentation for required environment variables and volume mount paths.
 - Persist application data by mounting a local `./data` directory.
 - Caddy reaches this via `bookstack:6875`.
 ---
 ## 5. App with PostgreSQL Database
 A two-service stack with an application and a PostgreSQL database. The database is on an internal-only network. The app joins both the internal and proxy networks.
 ```yaml
 version: "3.8"
 services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: myapp
    restart: unless-stopped
    expose:
      - "3000"
    env_file:
      - .env
    depends_on:
      db:
        condition: service_healthy
    networks:
      - proxy
      - internal
  db:
    image: postgres:16-alpine
    container_name: myapp-db
    restart: unless-stopped
    environment:
      POSTGRES_DB: ${POSTGRES_DB:-myapp}
      POSTGRES_USER: ${POSTGRES_USER:-myapp}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?Set POSTGRES_PASSWORD in .env}
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-myapp}"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - internal
 volumes:
  pgdata:
 networks:
  proxy:
    name: proxy
    external: true
  internal:
    driver: bridge
 ```
 ### Notes
 - The database is **only** on the `internal` network — it is not reachable from Caddy or any other container outside this stack.
 - The app is on **both** `proxy` (so Caddy can reach it) and `internal` (so it can reach the database).
 - `depends_on` with `condition: service_healthy` ensures the app waits for PostgreSQL to be ready before starting.
 - The `${POSTGRES_PASSWORD:?...}` syntax causes compose to fail with an error if the variable is not set, preventing accidental deploys with no database password.
 - Use a named volume (`pgdata`) for database persistence.
 - In the app's `.env`, set the database URL:
  ```
  DATABASE_URL=postgresql://myapp:secretpassword@myapp-db:5432/myapp
  ```
  Note the hostname is the database container name (`myapp-db`), not `localhost`.
 ---
 ## 6. App with Environment File
 Pattern for managing configuration through `.env` files with a `.env.example` template checked into version control.
 ```yaml
 version: "3.8"
 services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: myapp
    restart: unless-stopped
    expose:
      - "3000"
    env_file:
      - .env
    networks:
      - proxy
 networks:
  proxy:
    name: proxy
    external: true
 ```
 ### Companion .env.example
 Check this file into version control as a template. The actual `.env` file contains secrets and is listed in `.gitignore` on public repos only (on private Gitea repos, `.env` is committed per project conventions).
 ```env
 # Application
 NODE_ENV=production
 PORT=3000
 APP_URL=https://myapp.lavender-daydream.com
 # Database (if applicable)
 DATABASE_URL=postgresql://user:password@myapp-db:5432/myapp
 # Secrets
 SESSION_SECRET=generate-a-random-string-here
 API_KEY=your-api-key-here
 # Email (Mailgun)
 MAILGUN_API_KEY=
 MAILGUN_DOMAIN=
 MAILGUN_FROM=noreply@lavender-daydream.com
 # Deploy listener webhook secret (must match /etc/deploy-listener/deploy-listener.env)
 WEBHOOK_SECRET=must-match-deploy-listener
 ```
 ### Notes
 - The `env_file` directive in compose loads all variables from `.env` into the container environment.
 - Variables defined in `env_file` are available both to the containerized application and to compose variable interpolation (`${VAR}` syntax in the compose file).
 - Always provide a `.env.example` with placeholder values and comments explaining each variable.
 - For the deploy listener to work, the repo's webhook secret must match the value in `/etc/deploy-listener/deploy-listener.env`.
 ---
 ## Universal Compose Conventions
 These conventions apply to ALL stacks on mew:
 1. **Always include the proxy network** if Caddy needs to reach the container:
   ```yaml
   networks:
     proxy:
       name: proxy
       external: true
   ```
 2. **Use `expose`, not `ports`**: Keep ports internal to Docker networks. Never bind to the host unless absolutely necessary.
 3. **Set `container_name`** explicitly: Caddy resolves containers by name. Avoid auto-generated names.
 4. **Set `restart: unless-stopped`**: Containers restart automatically after crashes or server reboots, but stay stopped if manually stopped.
 5. **Use `env_file` for secrets**: Do not hardcode secrets in the compose file.
 6. **Use health checks** for databases and critical dependencies to ensure proper startup ordering.
 7. **Persist data with named volumes or bind mounts**: Never rely on container-internal storage for important data.
--- a/references/infrastructure.md
+++ b/references/infrastructure.md
@@ -0,0 +1,431 @@
 # Infrastructure Reference — mew Server (155.94.170.136)
 This document describes every infrastructure component on the mew server relevant to deploying Docker Compose applications behind Caddy with automated Gitea-triggered deployments.
 ---
 ## 1. Deploy Listener
 ### Overview
 A Python webhook listener that receives push events from Gitea/Forgejo and automatically deploys the corresponding Docker Compose stack.
 ### Filesystem Locations
 | Item | Path |
 |------|------|
 | Script | `/usr/local/bin/deploy-listener.py` |
 | Systemd unit | `deploy-listener.service` |
 | Deploy map | `/etc/deploy-listener/deploy-map.json` |
 | Environment file | `/etc/deploy-listener/deploy-listener.env` |
 | Service user home | `/var/lib/deploy` |
 ### Service User
 - **User**: `deploy`
 - **Groups**: `docker`, `git`
 - **Home directory**: `/var/lib/deploy`
 The `deploy` user has Docker socket access through the `docker` group and repository access through the `git` group.
 ### Network Binding
 - **Port**: 50500
 - **Bind address**: 0.0.0.0
 - **Firewall**: UFW blocks external access to port 50500. Only Docker's internal 10.0.0.0/8 range is allowed. Caddy reaches the listener at `10.0.12.1:50500` (the proxy network gateway).
 ### Deploy Map
 Location: `/etc/deploy-listener/deploy-map.json`
 Format — a JSON object mapping `owner/repo` to the absolute path of the compose directory:
 ```json
 {
  "darren/compose-bookstack": "/srv/git/compose-bookstack",
  "darren/compose-linkstack": "/srv/git/compose-linkstack",
  "darren/my-app": "/srv/git/my-app"
 }
 ```
 Add a new entry to this file for every application that should be auto-deployed on push.
 ### Environment File
 Location: `/etc/deploy-listener/deploy-listener.env`
 ```env
 WEBHOOK_SECRET=<the-shared-secret>
 LISTEN_PORT=50500
 ```
 The `WEBHOOK_SECRET` value must match the secret configured in each Gitea/Forgejo webhook.
 ### Request Validation & Behavior
 1. **HMAC-SHA256 validation**: The listener reads the `X-Gitea-Signature` or `X-Forgejo-Signature` header and validates the request body against the `WEBHOOK_SECRET` using HMAC-SHA256. Requests that fail validation are rejected.
 2. **Branch filter**: Only pushes to `main` or `master` (checked via the `ref` field) trigger a deploy. All other branches are ignored.
 3. **Deploy map lookup**: The `repository.full_name` field (e.g., `darren/my-app`) is looked up in the deploy map. If not found, the request is ignored.
 4. **Deploy sequence**: On a valid push, the listener executes:
   ```bash
   cd /srv/git/my-app
   git pull
   docker compose pull
   docker compose up -d
   ```
 5. **Concurrency control**: A file lock prevents concurrent deploys. If a deploy is already running, the incoming request is queued or rejected.
 ### Health Check
 Verify the listener is running:
 ```bash
 curl https://deploy.lavender.spl.tech/health
 ```
 A successful response confirms the listener is reachable through Caddy and functioning.
 ### Systemd Management
 ```bash
 # Check status
 sudo systemctl status deploy-listener
 # Restart
 sudo systemctl restart deploy-listener
 # View logs
 sudo journalctl -u deploy-listener -f
 ```
 ---
 ## 2. Caddy Reverse Proxy
 ### Overview
 Caddy serves as the TLS-terminating reverse proxy for all applications on mew. It automatically provisions and renews certificates via Let's Encrypt.
 ### Filesystem Locations
 | Item | Path |
 |------|------|
 | Caddyfile | `/data/docker/caddy/Caddyfile` |
 | Compose file | `/data/docker/caddy/docker-compose.yaml` |
 | Container name | `caddy` |
 | Image | `caddy:2-alpine` |
 ### Network
 - **Network name**: `proxy`
 - **Type**: external Docker network
 - **Subnet**: 10.0.12.0/24
 - **Gateway**: 10.0.12.1
 - All application containers MUST join the `proxy` network for Caddy to reach them by container name.
 ### TLS
 - **Method**: Automatic via Let's Encrypt
 - **Email**: `postmaster@lavender-daydream.com`
 - No manual certificate management required. Caddy handles provisioning, renewal, and OCSP stapling automatically.
 ### Deploy Endpoint
 The deploy listener is exposed externally through Caddy:
 ```
 deploy.lavender.spl.tech → 10.0.12.1:50500
 ```
 This routes through the proxy network gateway to the host-bound deploy listener.
 ### Reloading the Caddyfile
 **Standard reload** (when Caddyfile content changed but inode is the same):
 ```bash
 docker exec caddy caddy reload --config /etc/caddy/Caddyfile
 ```
 **Full restart** (required when the Caddyfile inode changed, e.g., after replacing the file rather than editing in-place):
 ```bash
 cd /data/docker/caddy && docker compose restart caddy
 ```
 Always check whether the file was edited in-place or replaced. If replaced, you MUST restart rather than reload.
 ### Site Block Format
 Follow this exact format when adding new site blocks to the Caddyfile:
 ```
 # === App Name ===
 domain.example.com {
    encode zstd gzip
    reverse_proxy container_name:port
 }
 ```
 - Place the comment header (`# === App Name ===`) above each block for readability.
 - Always include `encode zstd gzip` for compression.
 - Use the container name (not IP) in the `reverse_proxy` directive — Caddy resolves container names on the proxy network.
 ---
 ## 3. Gitea API
 ### Connection Details
 | Item | Value |
 |------|-------|
 | Internal URL (from mew host) | `http://10.0.12.5:3000` |
 | External URL | `https://git.lavender-daydream.com` |
 | API base path | `/api/v1` |
 | Token location | `~/.claude/secrets/gitea.json` |
 ### Authentication
 Include the token as a header on every API request:
 ```
 Authorization: token {GITEA_TOKEN}
 ```
 ### Key Endpoints
 #### Check if a repo exists
 ```
 GET /api/v1/repos/{owner}/{repo}
 ```
 - **200**: Repo exists (response includes repo details).
 - **404**: Repo does not exist.
 #### Create a new repo
 ```
 POST /api/v1/user/repos
 Content-Type: application/json
 {
  "name": "my-app",
  "private": false,
  "auto_init": false
 }
 ```
 Set `auto_init` to `false` when pushing an existing local repo. Set to `true` if you want Gitea to create an initial commit.
 #### Add a webhook
 ```
 POST /api/v1/repos/{owner}/{repo}/hooks
 Content-Type: application/json
 {
  "type": "gitea",
  "active": true,
  "branch_filter": "main master",
  "config": {
    "url": "https://deploy.lavender.spl.tech/webhook",
    "content_type": "json",
    "secret": "<WEBHOOK_SECRET>"
  },
  "events": ["push"]
 }
 ```
 The `secret` in the webhook config MUST match the `WEBHOOK_SECRET` in `/etc/deploy-listener/deploy-listener.env`.
 #### List repos
 ```
 GET /api/v1/repos/search?limit=50
 ```
 Returns up to 50 repositories. Use `page` parameter for pagination.
 ---
 ## 4. Forgejo API
 ### Connection Details
 | Item | Value |
 |------|-------|
 | Container name | `forgejo` |
 | Internal port | 3000 |
 | External URL | `https://forgejo.lavender-daydream.com` |
 | SSH port | 2223 |
 ### API Compatibility
 Forgejo is a fork of Gitea. The API format, endpoints, authentication, and request/response structures are identical to those documented in the Gitea section above. Use the same patterns — just substitute the Forgejo base URL.
 ### SSH Access
 ```bash
 git remote add forgejo ssh://git@forgejo.lavender-daydream.com:2223/owner/repo.git
 ```
 ---
 ## 5. Cloudflare DNS
 ### Token & Zone Configuration
 Location: `~/.claude/secrets/cloudflare.json`
 Format:
 ```json
 {
  "CLOUDFLARE_API_TOKEN": "your-api-token-here",
  "zones": {
    "lavender-daydream.com": "zone_id_for_lavender_daydream",
    "spl.tech": "zone_id_for_spl_tech"
  }
 }
 ```
 ### Authentication
 Include the token as a Bearer header:
 ```
 Authorization: Bearer {CLOUDFLARE_API_TOKEN}
 ```
 ### Create an A Record
 ```
 POST https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records
 Content-Type: application/json
 {
  "type": "A",
  "name": "{subdomain}",
  "content": "155.94.170.136",
  "ttl": 1,
  "proxied": false
 }
 ```
 - **`name`**: The subdomain portion (e.g., `myapp` for `myapp.lavender-daydream.com`, or the full FQDN).
 - **`content`**: Always `155.94.170.136` (mew's public IP).
 - **`ttl`**: `1` means automatic TTL.
 - **`proxied`**: Set to `false` so Caddy handles TLS directly. Setting to `true` would route through Cloudflare's proxy and interfere with Let's Encrypt.
 ### Choosing the Zone
 Pick the zone based on the desired domain suffix:
 - `*.lavender-daydream.com` → use the `lavender-daydream.com` zone ID
 - `*.spl.tech` → use the `spl.tech` zone ID
 ---
 ## 6. Docker Networking
 ### The `proxy` Network
 | Property | Value |
 |----------|-------|
 | Name | `proxy` |
 | Subnet | 10.0.12.0/24 |
 | Gateway | 10.0.12.1 |
 | Type | External (created once, referenced by all stacks) |
 ### Requirements
 - **Every application container** that Caddy must reach MUST join the `proxy` network.
 - Caddy resolves container names to IPs on this network — use container names (not IPs) in `reverse_proxy` directives.
 - The network is created externally (not by any single compose file). If it does not exist, create it:
 ```bash
 docker network create --subnet=10.0.12.0/24 --gateway=10.0.12.1 proxy
 ```
 ### Compose Configuration
 Every compose file that needs Caddy access must include:
 ```yaml
 networks:
  proxy:
    name: proxy
    external: true
 ```
 And each service that Caddy proxies to must list `proxy` in its `networks` key:
 ```yaml
 services:
  app:
    # ...
    networks:
      - proxy
 ```
 If the stack also has internal-only services (e.g., a database), create an additional internal network:
 ```yaml
 networks:
  proxy:
    name: proxy
    external: true
  internal:
    driver: bridge
 ```
 ---
 ## 7. Compose Stack Locations
 ### Core Infrastructure Stacks
 Location: `/data/docker/`
 These are foundational services that support the entire server:
 | Directory | Service |
 |-----------|---------|
 | `/data/docker/caddy/` | Caddy reverse proxy |
 | `/data/docker/gitea/` | Gitea git forge |
 | `/data/docker/forgejo/` | Forgejo git forge |
 | `/data/docker/email/` | Email services |
 | `/data/docker/website/` | Main website |
 | `/data/docker/linkstack-berlyn/` | Berlyn's linkstack |
 ### Application Stacks
 Location: `/srv/git/`
 These are deployed applications managed by the deploy listener:
 | Directory | Application |
 |-----------|-------------|
 | `/srv/git/compose-bookstack/` | BookStack wiki |
 | `/srv/git/compose-linkstack/` | LinkStack |
 | `/srv/git/compose-portainer/` | Portainer |
 | `/srv/git/compose-wishthis/` | WishThis |
 | `/srv/git/compose-anythingllm/` | AnythingLLM |
 ### Ownership & Permissions
 - **Owner**: `root:git`
 - **Permissions**: `2775` (setgid)
 - The setgid bit ensures new files and directories inherit the `git` group, so both `root` and members of the `git` group (including `deploy` and `darren`) can read/write.
 ### Standard Stack Contents
 Each compose stack directory should contain:
 | File | Purpose |
 |------|---------|
 | `docker-compose.yaml` | Service definitions |
 | `.env` | Environment variables (secrets, config) |
 | `Makefile` | Convenience targets (`make up`, `make down`, `make logs`) |
 | `README.md` | Stack documentation |
--- a/scripts/detect_deployment.py
+++ b/scripts/detect_deployment.py
@@ -0,0 +1,403 @@
 #!/usr/bin/env python3
 """
 detect_deployment.py — Detect whether an app is already deployed.
 Checks three signals:
  1. Deploy map  — repo entry in /etc/deploy-listener/deploy-map.json
  2. Gitea       — repo exists on the Gitea instance
  3. Caddy       — a Caddyfile reverse_proxy maps to the app's container name
 Outputs JSON to stdout and uses exit code to indicate status:
  0 — deployed   (at least deploy_map + gitea match)
  1 — not deployed
 Usage:
    python3 detect_deployment.py --repo-name darren/my-app [--config path/to/config.json]
 Cross-platform: works on Linux and Windows (WSL / Git Bash).
 Stdlib only — no pip dependencies.
 """
 from __future__ import annotations
 import argparse
 import json
 import os
 import platform
 import re
 import socket
 import subprocess
 import sys
 import urllib.error
 import urllib.request
 from pathlib import Path
 from typing import Any
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------
 def _load_json(path: str | Path) -> Any:
    """Load and return parsed JSON from *path*."""
    with open(path, encoding="utf-8") as fh:
        return json.load(fh)
 def _resolve_config_path(explicit: str | None, script_dir: Path) -> Path:
    """Return the config.json path — explicit arg wins, otherwise default."""
    if explicit:
        return Path(explicit).resolve()
    return (script_dir.parent / "config.json").resolve()
 def _is_local(server_hostname: str) -> bool:
    """Return True when we appear to be running *on* the target server."""
    local_name = socket.gethostname().lower()
    if local_name == server_hostname.lower():
        return True
    # Also check common aliases
    try:
        server_ip = socket.gethostbyname(server_hostname)
        local_ips = socket.gethostbyname_ex(socket.gethostname())[2]
        if server_ip in local_ips:
            return True
    except (socket.gaierror, socket.herror, OSError):
        pass
    return False
 def _run_local(cmd: list[str], timeout: int = 15) -> tuple[int, str, str]:
    """Run *cmd* locally and return (returncode, stdout, stderr)."""
    try:
        proc = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=timeout,
        )
        return proc.returncode, proc.stdout, proc.stderr
    except FileNotFoundError:
        return 127, "", f"Command not found: {cmd[0]}"
    except subprocess.TimeoutExpired:
        return 124, "", f"Command timed out after {timeout}s"
 def _run_ssh(ssh_user: str, ssh_host: str, remote_cmd: str, timeout: int = 15) -> tuple[int, str, str]:
    """Execute *remote_cmd* over SSH and return (returncode, stdout, stderr)."""
    ssh_target = f"{ssh_user}@{ssh_host}" if ssh_user else ssh_host
    cmd = [
        "ssh",
        "-o", "BatchMode=yes",
        "-o", "ConnectTimeout=10",
        "-o", "StrictHostKeyChecking=accept-new",
        ssh_target,
        remote_cmd,
    ]
    return _run_local(cmd, timeout=timeout)
 def _read_remote_file(ssh_user: str, ssh_host: str, remote_path: str) -> str | None:
    """Read a file on the remote server via SSH. Returns contents or None on failure."""
    rc, stdout, stderr = _run_ssh(ssh_user, ssh_host, f"cat {remote_path}")
    if rc == 0:
        return stdout
    return None
 def _read_file_local_or_remote(
    path: str,
    is_local: bool,
    ssh_user: str,
    ssh_host: str,
 ) -> str | None:
    """Read a file — locally if we are on the server, otherwise via SSH."""
    if is_local:
        try:
            return Path(path).read_text(encoding="utf-8")
        except (OSError, UnicodeDecodeError):
            return None
    return _read_remote_file(ssh_user, ssh_host, path)
 def _read_gitea_token(token_path: str) -> str | None:
    """Read the Gitea API token from a file path."""
    try:
        return Path(token_path).expanduser().read_text(encoding="utf-8").strip()
    except (OSError, UnicodeDecodeError):
        return None
 # ---------------------------------------------------------------------------
 # Signal 1: Deploy Map
 # ---------------------------------------------------------------------------
 def check_deploy_map(
    repo_name: str,
    deploy_map_path: str,
    is_local: bool,
    ssh_user: str,
    ssh_host: str,
 ) -> tuple[bool, dict[str, Any]]:
    """
    Check if *repo_name* exists in the deploy map JSON file.
    Returns (found: bool, details: dict).
    """
    details: dict[str, Any] = {}
    raw = _read_file_local_or_remote(deploy_map_path, is_local, ssh_user, ssh_host)
    if raw is None:
        details["error"] = "Could not read deploy map"
        return False, details
    try:
        deploy_map = json.loads(raw)
    except json.JSONDecodeError as exc:
        details["error"] = f"Invalid JSON in deploy map: {exc}"
        return False, details
    # The deploy map may use different key formats — check common patterns:
    #   "owner/repo", "repo", or nested by owner.
    repo_lower = repo_name.lower()
    repo_short = repo_name.split("/")[-1].lower() if "/" in repo_name else repo_name.lower()
    # Flat dict keyed by "owner/repo"
    if isinstance(deploy_map, dict):
        for key, value in deploy_map.items():
            if key.lower() == repo_lower or key.lower() == repo_short:
                details["matched_key"] = key
                if isinstance(value, dict):
                    details["stack_dir"] = value.get("stack_dir", value.get("path", ""))
                else:
                    details["stack_dir"] = str(value)
                return True, details
    # List of entries with a "repo" field
    if isinstance(deploy_map, list):
        for entry in deploy_map:
            if not isinstance(entry, dict):
                continue
            entry_repo = (entry.get("repo") or entry.get("repository") or "").lower()
            if entry_repo == repo_lower or entry_repo == repo_short:
                details["matched_key"] = entry_repo
                details["stack_dir"] = entry.get("stack_dir", entry.get("path", ""))
                return True, details
    return False, details
 # ---------------------------------------------------------------------------
 # Signal 2: Gitea
 # ---------------------------------------------------------------------------
 def check_gitea(
    repo_name: str,
    gitea_url: str,
    gitea_token: str | None,
 ) -> tuple[bool, dict[str, Any]]:
    """
    Check if *repo_name* (owner/repo) exists on Gitea.
    Returns (exists: bool, details: dict).
    """
    details: dict[str, Any] = {}
    # Ensure owner/repo format
    if "/" not in repo_name:
        details["error"] = "repo_name must be in owner/repo format for Gitea check"
        return False, details
    owner, repo = repo_name.split("/", 1)
    api_url = f"{gitea_url.rstrip('/')}/api/v1/repos/{owner}/{repo}"
    details["gitea_url"] = api_url
    req = urllib.request.Request(api_url, method="GET")
    req.add_header("Accept", "application/json")
    if gitea_token:
        req.add_header("Authorization", f"token {gitea_token}")
    try:
        with urllib.request.urlopen(req, timeout=15) as resp:
            if resp.status == 200:
                body = json.loads(resp.read().decode("utf-8"))
                details["full_name"] = body.get("full_name", "")
                details["html_url"] = body.get("html_url", "")
                details["description"] = body.get("description", "")
                return True, details
    except urllib.error.HTTPError as exc:
        if exc.code == 404:
            details["status"] = 404
            return False, details
        details["error"] = f"HTTP {exc.code}: {exc.reason}"
        return False, details
    except urllib.error.URLError as exc:
        details["error"] = f"URL error: {exc.reason}"
        return False, details
    except OSError as exc:
        details["error"] = f"Connection error: {exc}"
        return False, details
    return False, details
 # ---------------------------------------------------------------------------
 # Signal 3: Caddy
 # ---------------------------------------------------------------------------
 def check_caddy(
    repo_name: str,
    caddyfile_path: str,
    is_local: bool,
    ssh_user: str,
    ssh_host: str,
 ) -> tuple[bool, dict[str, Any]]:
    """
    Check if the Caddyfile has a reverse_proxy pointing to this app's container.
    Heuristic: look for the container name (short repo name) in reverse_proxy
    directives or upstream blocks.
    Returns (found: bool, details: dict).
    """
    details: dict[str, Any] = {}
    container_name = repo_name.split("/")[-1].lower() if "/" in repo_name else repo_name.lower()
    raw = _read_file_local_or_remote(caddyfile_path, is_local, ssh_user, ssh_host)
    if raw is None:
        details["error"] = "Could not read Caddyfile"
        return False, details
    # Parse the Caddyfile looking for:
    #   reverse_proxy <container_name>:<port>
    #   reverse_proxy http://<container_name>:<port>
    # Also capture the domain (site block header) associated with the match.
    lines = raw.splitlines()
    current_domain = ""
    # Pattern: matches a Caddy site-block header (domain line) — simplified heuristic
    domain_pattern = re.compile(r"^(\S+\.\S+)\s*\{?\s*$")
    proxy_pattern = re.compile(
        r"reverse_proxy\s+(?:https?://)?" + re.escape(container_name) + r"[:\s]",
        re.IGNORECASE,
    )
    for line in lines:
        stripped = line.strip()
        domain_match = domain_pattern.match(stripped)
        if domain_match:
            current_domain = domain_match.group(1)
        if proxy_pattern.search(stripped):
            details["domain"] = current_domain
            details["matched_line"] = stripped
            details["container_name"] = container_name
            return True, details
    return False, details
 # ---------------------------------------------------------------------------
 # Main orchestration
 # ---------------------------------------------------------------------------
 def detect(repo_name: str, config: dict[str, Any]) -> dict[str, Any]:
    """Run all three detection signals and return the combined result dict."""
    server = config.get("server", {})
    ssh_host = server.get("ssh_host", "")
    ssh_user = server.get("ssh_user", "")
    server_hostname = server.get("hostname", ssh_host)
    deploy_map_path = server.get("deploy_map_path", "/etc/deploy-listener/deploy-map.json")
    caddyfile_path = server.get("caddyfile_path", "/etc/caddy/Caddyfile")
    gitea_cfg = config.get("gitea", {})
    gitea_url = gitea_cfg.get("url", "")
    gitea_token_path = config.get("secrets", {}).get("gitea_token", "")
    gitea_token = _read_gitea_token(gitea_token_path) if gitea_token_path else None
    local = _is_local(server_hostname)
    # --- Signal 1: Deploy Map ---
    dm_found, dm_details = check_deploy_map(
        repo_name, deploy_map_path, local, ssh_user, ssh_host,
    )
    # --- Signal 2: Gitea ---
    gt_found, gt_details = check_gitea(repo_name, gitea_url, gitea_token)
    # --- Signal 3: Caddy ---
    cd_found, cd_details = check_caddy(
        repo_name, caddyfile_path, local, ssh_user, ssh_host,
    )
    # Merge details
    all_details: dict[str, Any] = {}
    if dm_details.get("stack_dir"):
        all_details["stack_dir"] = dm_details["stack_dir"]
    if cd_details.get("domain"):
        all_details["domain"] = cd_details["domain"]
    if gt_details.get("html_url"):
        all_details["gitea_url"] = gt_details["html_url"]
    deployed = dm_found and gt_found  # primary condition
    return {
        "deployed": deployed,
        "signals": {
            "deploy_map": dm_found,
            "gitea": gt_found,
            "caddy": cd_found,
        },
        "details": all_details,
    }
 # ---------------------------------------------------------------------------
 # CLI entry point
 # ---------------------------------------------------------------------------
 def main() -> None:
    parser = argparse.ArgumentParser(
        description="Detect whether an app is already deployed.",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=__doc__,
    )
    parser.add_argument(
        "--repo-name",
        required=True,
        help="Repository name in owner/repo format (e.g. darren/my-app).",
    )
    parser.add_argument(
        "--config",
        default=None,
        help="Path to config.json. Default: <script_dir>/../config.json",
    )
    args = parser.parse_args()
    script_dir = Path(__file__).resolve().parent
    config_path = _resolve_config_path(args.config, script_dir)
    if not config_path.exists():
        print(
            json.dumps({"error": f"Config file not found: {config_path}"}),
            file=sys.stderr,
        )
        sys.exit(1)
    try:
        config = _load_json(config_path)
    except (json.JSONDecodeError, OSError) as exc:
        print(
            json.dumps({"error": f"Failed to load config: {exc}"}),
            file=sys.stderr,
        )
        sys.exit(1)
    result = detect(args.repo_name, config)
    # JSON to stdout
    print(json.dumps(result, indent=2))
    # Exit code: 0 = deployed, 1 = not deployed
    sys.exit(0 if result["deployed"] else 1)
 if __name__ == "__main__":
    main()
--- a/scripts/validate_compose.py
+++ b/scripts/validate_compose.py
@@ -0,0 +1,724 @@
 #!/usr/bin/env python3
 """
 validate-compose.py — Production-readiness validator for Docker Compose files.
 Usage:
    python3 validate-compose.py <path/to/docker-compose.yaml> [--strict]
 Exit codes:
    0 — passed (no errors; warnings may exist)
    1 — failed (one or more errors found)
 On Windows, use `python` instead of `python3` if needed.
 """
 import argparse
 import re
 import sys
 from pathlib import Path
 from typing import Any
 try:
    import yaml  # type: ignore[import-untyped]
 except ImportError:
    # Attempt stdlib tomllib fallback note — yaml is stdlib-adjacent but not
    # truly stdlib. Provide a clear message rather than silently failing.
    print("❌ ERROR: PyYAML is required. Install with: pip install pyyaml")
    sys.exit(1)
 # ---------------------------------------------------------------------------
 # Result accumulator
 # ---------------------------------------------------------------------------
 class ValidationResult:
    """Accumulates errors, warnings, and info messages from all checks."""
    def __init__(self) -> None:
        self.errors: list[str] = []
        self.warnings: list[str] = []
        self.infos: list[str] = []
    def error(self, msg: str) -> None:
        self.errors.append(msg)
    def warn(self, msg: str) -> None:
        self.warnings.append(msg)
    def info(self, msg: str) -> None:
        self.infos.append(msg)
    @property
    def passed(self) -> bool:
        return len(self.errors) == 0
    def print_report(self) -> None:
        """Print a formatted validation report to stdout."""
        total = len(self.errors) + len(self.warnings) + len(self.infos)
        if total == 0:
            print("✅ No issues found.")
            return
        if self.errors:
            print(f"\n🔴 ERRORS ({len(self.errors)})")
            for e in self.errors:
                print(f"   ✖ {e}")
        if self.warnings:
            print(f"\n🟡 WARNINGS ({len(self.warnings)})")
            for w in self.warnings:
                print(f"   ⚠ {w}")
        if self.infos:
            print(f"\n🔵 INFO ({len(self.infos)})")
            for i in self.infos:
                print(f"   ℹ {i}")
        print()
        if self.passed:
            print(f"✅ Passed  ({len(self.warnings)} warning(s), {len(self.infos)} info(s))")
        else:
            print(f"❌ Failed  ({len(self.errors)} error(s), {len(self.warnings)} warning(s))")
 # ---------------------------------------------------------------------------
 # Helper utilities
 # ---------------------------------------------------------------------------
 _SECRET_PATTERNS = [
    re.compile(r"(password|passwd|secret|token|key|api_key|apikey|auth|credential)", re.I),
 ]
 _HARDCODED_VALUE_PATTERN = re.compile(
    r"^(?!.*\$\{)(?!changeme)(?!placeholder)(?!your-).{8,}$"
 )
 _ENV_VAR_REF_PATTERN = re.compile(r"\$\{([A-Za-z_][A-Za-z0-9_]*)\}")
 _PREFERRED_PORT_MIN = 50000
 _PREFERRED_PORT_MAX = 60000
 _DB_CACHE_IMAGES = [
    "postgres", "postgresql",
    "mariadb", "mysql",
    "mongo", "mongodb",
    "redis", "valkey",
    "memcached",
    "cassandra",
    "couchdb",
    "influxdb",
 ]
 def _iter_services(compose: dict[str, Any]):
    """Yield (name, service_dict) for every service in the compose file."""
    for name, svc in (compose.get("services") or {}).items():
        yield name, (svc or {})
 def _get_depends_on_names(depends_on: Any) -> list[str]:
    """Normalise depends_on to a flat list of service name strings."""
    if isinstance(depends_on, list):
        return depends_on
    if isinstance(depends_on, dict):
        return list(depends_on.keys())
    return []
 def _image_name_and_tag(image: str) -> tuple[str, str]:
    """Split 'image:tag' into (image_name, tag). Tag defaults to '' if absent."""
    if ":" in image:
        parts = image.rsplit(":", 1)
        return parts[0], parts[1]
    return image, ""
 def _is_db_cache_image(image: str) -> bool:
    name, _ = _image_name_and_tag(image)
    base = name.split("/")[-1].lower()
    return any(base == db or base.startswith(db) for db in _DB_CACHE_IMAGES)
 def _collect_all_string_values(obj: Any, result: list[str]) -> None:
    """Recursively collect all string leaf values from a nested structure."""
    if isinstance(obj, str):
        result.append(obj)
    elif isinstance(obj, dict):
        for v in obj.values():
            _collect_all_string_values(v, result)
    elif isinstance(obj, list):
        for item in obj:
            _collect_all_string_values(item, result)
 def _parse_host_port(port_spec: Any) -> int | None:
    """
    Extract the host (published) port from a port mapping.
    Supports:
      - "8080:80"
      - "127.0.0.1:8080:80"
      - {"published": 8080, "target": 80}
      - 8080  (short form — interpreted as host==container)
    """
    if isinstance(port_spec, dict):
        published = port_spec.get("published")
        if published is not None:
            try:
                return int(published)
            except (ValueError, TypeError):
                pass
        return None
    spec = str(port_spec)
    parts = spec.split(":")
    # "hostip:hostport:containerport" → parts[-2] is host port
    # "hostport:containerport"        → parts[0] is host port
    # "containerport"                 → no explicit host port mapping
    if len(parts) >= 2:
        try:
            return int(parts[-2].split("/")[0])
        except (ValueError, IndexError):
            pass
    elif len(parts) == 1:
        try:
            return int(parts[0].split("/")[0])
        except (ValueError, IndexError):
            pass
    return None
 # ---------------------------------------------------------------------------
 # Individual checks (original set)
 # ---------------------------------------------------------------------------
 def validate_image_tags(compose: dict[str, Any], result: ValidationResult) -> None:
    """Warn on :latest or untagged images."""
    for name, svc in _iter_services(compose):
        image = svc.get("image", "")
        if not image:
            continue
        img_name, tag = _image_name_and_tag(image)
        if not tag:
            result.warn(f"[{name}] Image '{img_name}' has no tag — pin to a specific version.")
        elif tag == "latest":
            result.warn(f"[{name}] Image '{img_name}:latest' — never use :latest in production.")
 def validate_restart_policy(compose: dict[str, Any], result: ValidationResult) -> None:
    """Check that all services have a restart policy."""
    for name, svc in _iter_services(compose):
        restart = svc.get("restart")
        if not restart:
            result.warn(f"[{name}] No restart policy — add 'restart: unless-stopped'.")
 def validate_healthchecks(compose: dict[str, Any], result: ValidationResult) -> None:
    """Check that all services define or inherit a healthcheck."""
    for name, svc in _iter_services(compose):
        hc = svc.get("healthcheck")
        if hc is None:
            result.info(f"[{name}] No healthcheck defined — add one if the image supports it.")
        elif isinstance(hc, dict) and hc.get("disable"):
            result.info(f"[{name}] Healthcheck explicitly disabled.")
 def validate_no_hardcoded_secrets(compose: dict[str, Any], result: ValidationResult) -> None:
    """Detect hardcoded secrets in environment and labels."""
    for name, svc in _iter_services(compose):
        env = svc.get("environment") or {}
        items: list[tuple[str, str]] = []
        if isinstance(env, dict):
            items = list(env.items())
        elif isinstance(env, list):
            for entry in env:
                if "=" in str(entry):
                    k, v = str(entry).split("=", 1)
                    items.append((k, v))
        for key, value in items:
            if not value or str(value).startswith("${"):
                continue
            for pat in _SECRET_PATTERNS:
                if pat.search(key):
                    result.error(
                        f"[{name}] Possible hardcoded secret in env var '{key}' — "
                        "use ${VAR_NAME} references and store values in .env."
                    )
                    break
 def validate_resource_limits(compose: dict[str, Any], result: ValidationResult, strict: bool) -> None:
    """In strict mode, require resource limits on all services."""
    if not strict:
        return
    for name, svc in _iter_services(compose):
        deploy = svc.get("deploy") or {}
        resources = deploy.get("resources") or {}
        limits = resources.get("limits") or {}
        mem = limits.get("memory") or svc.get("mem_limit")
        cpus = limits.get("cpus") or svc.get("cpus")
        if not mem:
            result.error(f"[{name}] No memory limit set (strict mode) — add deploy.resources.limits.memory.")
        if not cpus:
            result.warn(f"[{name}] No CPU limit set — consider adding deploy.resources.limits.cpus.")
 def validate_logging(compose: dict[str, Any], result: ValidationResult) -> None:
    """Warn when no logging config is specified."""
    for name, svc in _iter_services(compose):
        if not svc.get("logging"):
            result.info(
                f"[{name}] No logging config — consider adding logging.driver and options "
                "(e.g. json-file with max-size/max-file)."
            )
 def validate_privileged_mode(compose: dict[str, Any], result: ValidationResult) -> None:
    """Warn on privileged containers."""
    for name, svc in _iter_services(compose):
        if svc.get("privileged"):
            result.warn(f"[{name}] Running in privileged mode — grant only if strictly required.")
 def validate_host_network(compose: dict[str, Any], result: ValidationResult) -> None:
    """Warn on host network mode."""
    for name, svc in _iter_services(compose):
        network_mode = svc.get("network_mode", "")
        if network_mode == "host":
            result.warn(f"[{name}] Using host network mode — isolate with a bridge network if possible.")
 def validate_sensitive_volumes(compose: dict[str, Any], result: ValidationResult) -> None:
    """Warn on sensitive host paths mounted into containers."""
    sensitive_paths = ["/etc", "/var/run/docker.sock", "/proc", "/sys", "/root", "/home"]
    for name, svc in _iter_services(compose):
        volumes = svc.get("volumes") or []
        for vol in volumes:
            if isinstance(vol, str):
                host_part = vol.split(":")[0]
            elif isinstance(vol, dict):
                host_part = str(vol.get("source", ""))
            else:
                continue
            for sensitive in sensitive_paths:
                if host_part == sensitive or host_part.startswith(sensitive + "/"):
                    result.warn(
                        f"[{name}] Sensitive host path mounted: '{host_part}' — "
                        "verify this is intentional."
                    )
 def validate_traefik_network_consistency(compose: dict[str, Any], result: ValidationResult) -> None:
    """Ensure services with Traefik labels are joined to the Traefik network."""
    traefik_network_names: set[str] = set()
    # Heuristic: networks named 'traefik*' or 'proxy*' are Traefik-facing
    for net_name in (compose.get("networks") or {}).keys():
        if "traefik" in net_name.lower() or "proxy" in net_name.lower():
            traefik_network_names.add(net_name)
    for name, svc in _iter_services(compose):
        labels = svc.get("labels") or {}
        label_items: list[str] = []
        if isinstance(labels, dict):
            label_items = list(labels.keys())
        elif isinstance(labels, list):
            label_items = [str(l).split("=")[0] for l in labels]
        has_traefik_label = any("traefik" in lbl.lower() for lbl in label_items)
        if not has_traefik_label:
            continue
        svc_networks = set()
        svc_net_section = svc.get("networks") or {}
        if isinstance(svc_net_section, list):
            svc_networks = set(svc_net_section)
        elif isinstance(svc_net_section, dict):
            svc_networks = set(svc_net_section.keys())
        if traefik_network_names and not svc_networks.intersection(traefik_network_names):
            result.warn(
                f"[{name}] Has Traefik labels but is not on a Traefik-facing network "
                f"({', '.join(traefik_network_names)})."
            )
 def validate_traefik_router_uniqueness(compose: dict[str, Any], result: ValidationResult) -> None:
    """Error on duplicate Traefik router names across services."""
    seen_routers: dict[str, str] = {}
    router_pattern = re.compile(r"traefik\.http\.routers\.([^.]+)\.", re.I)
    for name, svc in _iter_services(compose):
        labels = svc.get("labels") or {}
        label_keys: list[str] = []
        if isinstance(labels, dict):
            label_keys = list(labels.keys())
        elif isinstance(labels, list):
            label_keys = [str(l).split("=")[0] for l in labels]
        for key in label_keys:
            m = router_pattern.match(key)
            if m:
                router_name = m.group(1).lower()
                if router_name in seen_routers:
                    result.error(
                        f"[{name}] Duplicate Traefik router name '{router_name}' "
                        f"(also used in service '{seen_routers[router_name]}')."
                    )
                else:
                    seen_routers[router_name] = name
 def validate_container_name_uniqueness(compose: dict[str, Any], result: ValidationResult) -> None:
    """Error on duplicate container_name values."""
    seen: dict[str, str] = {}
    for name, svc in _iter_services(compose):
        container_name = svc.get("container_name")
        if not container_name:
            continue
        if container_name in seen:
            result.error(
                f"[{name}] Duplicate container_name '{container_name}' "
                f"(also used by service '{seen[container_name]}')."
            )
        else:
            seen[container_name] = name
 def validate_depends_on(compose: dict[str, Any], result: ValidationResult) -> None:
    """Check that depends_on references valid service names."""
    service_names = set((compose.get("services") or {}).keys())
    for name, svc in _iter_services(compose):
        deps = _get_depends_on_names(svc.get("depends_on") or [])
        for dep in deps:
            if dep not in service_names:
                result.error(
                    f"[{name}] depends_on references unknown service '{dep}'."
                )
 def validate_networks(compose: dict[str, Any], result: ValidationResult) -> None:
    """Check that service networks are declared at the top level."""
    declared = set((compose.get("networks") or {}).keys())
    for name, svc in _iter_services(compose):
        svc_nets = svc.get("networks") or {}
        if isinstance(svc_nets, list):
            used = set(svc_nets)
        elif isinstance(svc_nets, dict):
            used = set(svc_nets.keys())
        else:
            used = set()
        for net in used:
            if net not in declared:
                result.error(
                    f"[{name}] Uses network '{net}' which is not declared in the "
                    "top-level 'networks' section."
                )
 def validate_volumes(compose: dict[str, Any], result: ValidationResult) -> None:
    """Check for undefined named volumes and orphaned top-level volume declarations."""
    declared_volumes = set((compose.get("volumes") or {}).keys())
    used_volumes: set[str] = set()
    for name, svc in _iter_services(compose):
        for vol in (svc.get("volumes") or []):
            if isinstance(vol, str):
                parts = vol.split(":")
                ref = parts[0]
            elif isinstance(vol, dict):
                ref = str(vol.get("source", ""))
            else:
                continue
            # Named volumes don't start with . / ~ or a drive letter pattern
            if ref and not re.match(r"^[./~]|^[A-Za-z]:[/\\]", ref):
                used_volumes.add(ref)
                if declared_volumes and ref not in declared_volumes:
                    result.error(
                        f"[{name}] Uses named volume '{ref}' which is not declared "
                        "in the top-level 'volumes' section."
                    )
    for vol in declared_volumes:
        if vol not in used_volumes:
            result.warn(f"Top-level volume '{vol}' is declared but never used by any service.")
 def validate_port_conflicts(compose: dict[str, Any], result: ValidationResult) -> None:
    """Error on duplicate host port bindings."""
    seen_ports: dict[int, str] = {}
    for name, svc in _iter_services(compose):
        for port_spec in (svc.get("ports") or []):
            host_port = _parse_host_port(port_spec)
            if host_port is None:
                continue
            if host_port in seen_ports:
                result.error(
                    f"[{name}] Host port {host_port} conflicts with service "
                    f"'{seen_ports[host_port]}'."
                )
            else:
                seen_ports[host_port] = name
 # ---------------------------------------------------------------------------
 # NEW checks
 # ---------------------------------------------------------------------------
 def validate_circular_dependencies(compose: dict[str, Any], result: ValidationResult) -> None:
    """Detect circular dependencies in the depends_on graph using DFS."""
    services = compose.get("services") or {}
    # Build adjacency list: service_name -> list of dependencies
    graph: dict[str, list[str]] = {}
    for name, svc in services.items():
        graph[name] = _get_depends_on_names((svc or {}).get("depends_on") or [])
    visited: set[str] = set()
    in_stack: set[str] = set()
    def dfs(node: str, path: list[str]) -> bool:
        """Return True if a cycle is detected."""
        visited.add(node)
        in_stack.add(node)
        for neighbour in graph.get(node, []):
            if neighbour not in graph:
                # Unknown dependency — already caught by validate_depends_on
                continue
            if neighbour not in visited:
                if dfs(neighbour, path + [neighbour]):
                    return True
            elif neighbour in in_stack:
                cycle_path = " → ".join(path + [neighbour])
                result.error(
                    f"Circular dependency detected: {cycle_path}"
                )
                return True
        in_stack.discard(node)
        return False
    for service_name in graph:
        if service_name not in visited:
            dfs(service_name, [service_name])
 def validate_port_range(compose: dict[str, Any], result: ValidationResult) -> None:
    """Warn if host ports are outside the preferred 50000-60000 range."""
    for name, svc in _iter_services(compose):
        for port_spec in (svc.get("ports") or []):
            host_port = _parse_host_port(port_spec)
            if host_port is None:
                continue
            if not (_PREFERRED_PORT_MIN <= host_port <= _PREFERRED_PORT_MAX):
                result.warn(
                    f"[{name}] Host port {host_port} is outside the preferred range "
                    f"{_PREFERRED_PORT_MIN}-{_PREFERRED_PORT_MAX}."
                )
 def validate_network_isolation(compose: dict[str, Any], result: ValidationResult) -> None:
    """Warn if database/cache services are exposed on external networks."""
    top_level_networks = compose.get("networks") or {}
    for name, svc in _iter_services(compose):
        image = svc.get("image", "")
        if not image or not _is_db_cache_image(image):
            continue
        svc_nets = svc.get("networks") or {}
        if isinstance(svc_nets, list):
            net_names = svc_nets
        elif isinstance(svc_nets, dict):
            net_names = list(svc_nets.keys())
        else:
            net_names = []
        for net_name in net_names:
            net_config = top_level_networks.get(net_name) or {}
            # A network is considered "external" if it has external: true
            # or if it is named in a way that suggests it is the proxy/public network.
            is_external = net_config.get("external", False)
            is_proxy_net = any(
                kw in net_name.lower() for kw in ("traefik", "proxy", "public", "frontend")
            )
            if is_external or is_proxy_net:
                result.warn(
                    f"[{name}] Database/cache service is connected to external or proxy "
                    f"network '{net_name}' — use an internal network for isolation."
                )
 def validate_version_tags(compose: dict[str, Any], result: ValidationResult) -> None:
    """Check image version tag quality beyond just the :latest check."""
    semver_full = re.compile(r"^\d+\.\d+\.\d+")   # major.minor.patch
    semver_minor = re.compile(r"^\d+\.\d+$")        # major.minor only
    semver_major = re.compile(r"^\d+$")             # major only
    for name, svc in _iter_services(compose):
        image = svc.get("image", "")
        if not image:
            continue
        img_name, tag = _image_name_and_tag(image)
        if not tag:
            # Already caught by validate_image_tags — skip to avoid duplicate noise
            continue
        if tag == "latest":
            # Also caught above — error level comes from validate_image_tags
            result.error(
                f"[{name}] Image '{img_name}:latest' — :latest is forbidden in production."
            )
            continue
        if semver_full.match(tag):
            # Fully pinned — great
            pass
        elif semver_minor.match(tag):
            result.warn(
                f"[{name}] Image '{img_name}:{tag}' uses major.minor only — "
                "pin to a full major.minor.patch tag for reproducible builds."
            )
        elif semver_major.match(tag):
            result.warn(
                f"[{name}] Image '{img_name}:{tag}' uses major version only — "
                "pin to at least major.minor.patch."
            )
        else:
            # Non-semver tags (sha digests, named releases, etc.) — accept as info
            result.info(
                f"[{name}] Image '{img_name}:{tag}' uses a non-semver tag — "
                "verify this is a pinned, stable release."
            )
 def validate_env_references(compose: dict[str, Any], result: ValidationResult) -> None:
    """
    Check that ${VAR} references in service configs have matching definitions.
    Scans all string values in each service's config for ${VAR} patterns, then
    checks whether those variables appear in the service's `environment` block
    or are referenced via `env_file`. Cannot validate the contents of .env files
    — only structural consistency within the compose file itself is checked.
    """
    for name, svc in _iter_services(compose):
        # Collect all ${VAR} references from the service's values
        all_values: list[str] = []
        _collect_all_string_values(svc, all_values)
        referenced_vars: set[str] = set()
        for val in all_values:
            for match in _ENV_VAR_REF_PATTERN.finditer(val):
                referenced_vars.add(match.group(1))
        if not referenced_vars:
            continue
        # Collect defined variable names from the environment block
        env_section = svc.get("environment") or {}
        defined_vars: set[str] = set()
        if isinstance(env_section, dict):
            defined_vars = set(env_section.keys())
        elif isinstance(env_section, list):
            for entry in env_section:
                key = str(entry).split("=")[0]
                defined_vars.add(key)
        has_env_file = bool(svc.get("env_file"))
        for var in sorted(referenced_vars):
            if var in defined_vars:
                continue  # Explicitly defined — fine
            if has_env_file:
                # Likely in the .env file — we can't verify, so just note it
                result.info(
                    f"[{name}] ${{{var}}} is referenced but not defined inline — "
                    "ensure it is present in the env_file."
                )
            else:
                result.warn(
                    f"[{name}] ${{{var}}} is referenced but has no inline definition "
                    "and no env_file is configured — ensure it is in your .env file."
                )
 # ---------------------------------------------------------------------------
 # Orchestrator
 # ---------------------------------------------------------------------------
 def run_all_checks(compose: dict[str, Any], strict: bool) -> ValidationResult:
    """Run every registered check and return the aggregated result."""
    result = ValidationResult()
    # Original checks
    validate_image_tags(compose, result)
    validate_restart_policy(compose, result)
    validate_healthchecks(compose, result)
    validate_no_hardcoded_secrets(compose, result)
    validate_resource_limits(compose, result, strict)
    validate_logging(compose, result)
    validate_privileged_mode(compose, result)
    validate_host_network(compose, result)
    validate_sensitive_volumes(compose, result)
    validate_traefik_network_consistency(compose, result)
    validate_traefik_router_uniqueness(compose, result)
    validate_container_name_uniqueness(compose, result)
    validate_depends_on(compose, result)
    validate_networks(compose, result)
    validate_volumes(compose, result)
    validate_port_conflicts(compose, result)
    # New checks
    validate_circular_dependencies(compose, result)
    validate_port_range(compose, result)
    validate_network_isolation(compose, result)
    validate_version_tags(compose, result)
    validate_env_references(compose, result)
    return result
 # ---------------------------------------------------------------------------
 # Entry point
 # ---------------------------------------------------------------------------
 def main() -> None:
    parser = argparse.ArgumentParser(
        description="Validate a Docker Compose file for production readiness.",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=__doc__,
    )
    parser.add_argument("compose_file", help="Path to the docker-compose.yaml file.")
    parser.add_argument(
        "--strict",
        action="store_true",
        help="Enable strict mode: resource limits become errors, not warnings.",
    )
    args = parser.parse_args()
    compose_path = Path(args.compose_file)
    if not compose_path.exists():
        print(f"❌ File not found: {compose_path}")
        sys.exit(1)
    try:
        with compose_path.open(encoding="utf-8") as fh:
            compose = yaml.safe_load(fh)
    except yaml.YAMLError as exc:
        print(f"❌ YAML parse error: {exc}")
        sys.exit(1)
    if not isinstance(compose, dict):
        print("❌ Compose file did not parse to a mapping — is it a valid YAML file?")
        sys.exit(1)
    mode_label = " [STRICT]" if args.strict else ""
    print(f"🐳 Validating: {compose_path}{mode_label}")
    print("-" * 60)
    result = run_all_checks(compose, strict=args.strict)
    result.print_report()
    sys.exit(0 if result.passed else 1)
 if __name__ == "__main__":
    main()