Skilllibrary self-hosting-ops
Provision and operate self-hosted infrastructure — configure VPS instances, write systemd service units, set up Nginx or Caddy reverse proxies, automate backups, configure firewall rules, and wire monitoring with Prometheus or similar. Use when deploying to self-managed servers, VPS providers, or bare metal instead of managed cloud services. Do not use for managed cloud platform deployments (prefer aws, gcp, vercel skills).
install
source · Clone the upstream repo
git clone https://github.com/merceralex397-collab/skilllibrary
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/14-cloud-platform-devops/self-hosting-ops" ~/.claude/skills/merceralex397-collab-skilllibrary-self-hosting-ops && rm -rf "$T"
manifest:
14-cloud-platform-devops/self-hosting-ops/SKILL.mdsource content
Purpose
Provision and operate self-hosted infrastructure on bare metal or VPS — write systemd service units, configure Nginx or Caddy reverse proxies, automate backups and restore procedures, harden firewall rules, and wire monitoring and alerting with Prometheus, Grafana, or similar tooling.
When to use this skill
- Provisioning a new VPS instance (Hetzner, DigitalOcean, Linode, OVH, or similar)
- Writing or editing systemd unit files for application services
- Configuring Nginx or Caddy as a reverse proxy with TLS termination
- Setting up automated backup schedules (database dumps, filesystem snapshots, off-site sync)
- Writing firewall rules with
,ufw
, oriptablesnftables - Configuring Prometheus, node_exporter, and Grafana for server monitoring
- Performing OS-level hardening (SSH config, fail2ban, unattended upgrades)
- Deploying applications via
,rsync
, or CI/CD push to a self-managed servergit pull
Do not use this skill when
- The target is a managed cloud platform — use
,aws
, orgcp
skills insteadvercel - The task is container orchestration on Kubernetes — use a Kubernetes-specific skill
- The task is Tailscale networking overlay configuration — use
tailscale-private-networking - The task is writing Terraform to provision cloud resources — use
terraform-iac - The task is application code changes with no infrastructure impact
Operating procedure
- Inventory the target server. SSH into the server and record: OS version (
), available RAM/disk (cat /etc/os-release
,free -h
), running services (df -h
), and open ports (systemctl list-units --type=service --state=running
).ss -tlnp - Harden SSH access. Edit
: disable/etc/ssh/sshd_config
, setPasswordAuthentication
, restrict to key-based auth. RestartPermitRootLogin no
. Verify you can still connect before closing the current session.sshd - Configure the firewall. Enable
(orufw
). Allow only required ports: 22/tcp (SSH), 80/tcp, 443/tcp, and any application-specific ports. Deny all other inbound traffic. Runnftables
to confirm.ufw status verbose - Install runtime dependencies. Install the application runtime (Node.js, Python, Go binary, etc.) using the OS package manager or version manager. Pin the version explicitly — do not use
.latest - Write the systemd service unit. Create
with:/etc/systemd/system/{service-name}.service
pointing to the application binary,ExecStart
set to a non-root service account,User
,Restart=on-failure
,RestartSec=5
, and environment file reference (WorkingDirectory
). RunEnvironmentFile=/etc/{service-name}/env
.systemctl daemon-reload && systemctl enable --now {service-name} - Configure the reverse proxy. For Nginx: write a server block in
with/etc/nginx/sites-available/{domain}
to the local app port, enable TLS via Certbot (proxy_pass
), andcertbot --nginx -d {domain}
toln -s
. For Caddy: write a Caddyfile block with the domain andsites-enabled
— Caddy handles TLS automatically. Test config (reverse_proxy localhost:{port}
ornginx -t
) and reload.caddy validate - Set up automated backups. Write a backup script that dumps databases (
,pg_dump
, ormysqldump
), tars application data directories, and syncs to off-site storage (sqlite3 .backup
to a backup server,rsync
to S3/B2, orrclone
to a repo). Schedule via cron (restic
) or a systemd timer. Run a manual test backup and verify restore.crontab -e - Wire monitoring. Install
and the application's metrics exporter. Configure Prometheus to scrape both. Set up Grafana dashboards for CPU, memory, disk, and application-specific metrics. Create alerting rules for: disk > 85%, memory > 90%, service restart count > 3 in 5 minutes, and TLS certificate expiry < 14 days.node_exporter - Configure unattended security updates. Enable
(Debian/Ubuntu) orunattended-upgrades
(Fedora/RHEL). Confirm only security patches are auto-applied. Schedule a reboot window if kernel updates require it.dnf-automatic - Validate the full deployment. Curl the public endpoint and verify a 200 response with expected content. Check
for errors. Confirm the backup cron or timer is scheduled (journalctl -u {service-name} --since '5 min ago'
orsystemctl list-timers
). Confirm Prometheus targets are UP in the Prometheus UI.crontab -l - Document the server runbook. Record: server IP/hostname, SSH access instructions, service names, backup schedule and restore procedure, monitoring dashboard URL, and escalation contacts.
Decision rules
- If the application needs zero-downtime deploys, use a blue-green strategy with two systemd units behind the reverse proxy — do not use in-place restart.
- If the server has < 1 GB RAM, skip Prometheus/Grafana on-box and push metrics to an external monitoring service.
- If the VPS provider offers automated snapshots, enable them as a supplement to application-level backups — but never as the sole backup.
- If TLS is required, prefer Caddy (automatic HTTPS) for simplicity. Use Nginx + Certbot when Nginx-specific features (complex rewrites, rate limiting) are needed.
- If multiple services share one server, isolate them with separate service accounts and systemd units — never run everything as root.
- Always test the backup restore procedure before considering backups "configured."
Output requirements
- Server inventory — OS, resources, IP, SSH access details
- systemd unit file(s) — complete
files ready to install.service - Reverse proxy config — Nginx server block or Caddyfile with TLS
- Backup configuration — backup script, cron/timer schedule, and restore procedure
- Firewall rules —
commands orufw
rules appliednftables - Monitoring setup — Prometheus scrape config, alerting rules, and dashboard reference
References
- systemd unit file documentation: https://www.freedesktop.org/software/systemd/man/systemd.service.html
- Nginx reverse proxy guide: https://nginx.org/en/docs/http/ngx_http_proxy_module.html
- Caddy reverse proxy: https://caddyserver.com/docs/caddyfile/directives/reverse_proxy
- Prometheus node_exporter: https://github.com/prometheus/node_exporter
- Restic backup tool: https://restic.readthedocs.io/
- UFW firewall: https://help.ubuntu.com/community/UFW
Related skills
— for containerized deployments on self-hosted serversdocker-containers
— for provisioning VPS instances via infrastructure-as-codeterraform-iac
— for injecting secrets into self-hosted servicessecret-management
— for private networking between self-hosted nodestailscale-private-networking
Anti-patterns
- Running application processes as root instead of a dedicated service account
- Using
ornohup
instead of systemd for long-running servicesscreen - Skipping firewall configuration because "it's behind a NAT"
- Relying solely on VPS provider snapshots without application-level backups
- Editing Nginx config in
directly instead ofsites-enabled
with symlinkssites-available - Hardcoding server IPs in application config instead of using DNS or service discovery
Failure handling
- If a systemd service fails to start, check
for the exact error. Fix the unit file or application config, thenjournalctl -u {service-name} -e
.systemctl daemon-reload && systemctl restart {service-name} - If the reverse proxy returns 502, verify the upstream application is running and listening on the expected port. Check
.ss -tlnp | grep {port} - If backups fail silently, add error handling to the backup script (
) and send a failure notification (email, webhook, or monitoring alert).set -euo pipefail - If disk fills up, identify the largest directories with
, clean log files or old backups, and add a disk-usage alert to prevent recurrence.du -sh /* | sort -rh | head - If SSH access is lost after config changes, use the VPS provider's console/VNC access to recover.