Appearance
ADR 0004: VPS System Diagnostic & Traefik Routing Fixes
Status
Accepted (Established February 2026)
Context
Following a scheduled system reboot (or Docker Cleanup) of the Strato VPS, critical services within the cfs-infra stack experienced operational failures despite their containers showing as up and healthy:
- Cockpit Daemon: Yielded a
502 Bad Gatewayerror through Traefik. The systemd socket (cockpit.socket) bound to the host network failed to start because the Docker bridge interface (docker0at172.17.0.1) was not yet established during systemd's early startup sequence. - Open-WebUI: Traefik was unable to issue a Let's Encrypt SSL certificate for the domain
openwebui-ls.cfscfs.com. The UI returned 404s and proxy errors. This occurred due to an outdated routing rule inside thedocker-compose.yml(ai-ls.cfscfs.com).
Decision
To ensure high-availability and zero-downtime resilience against unscheduled host machine reboots or Docker network rebuilds, we implemented two permanent infrastructural fixes:
Systemd Socket Race-Condition Fix (Cockpit):
- Implemented a
systemddrop-in configuration override forcockpit.socket(/etc/systemd/system/cockpit.socket.d/listen.conf). - Added the
FreeBind=yesUNIX socket option. - Why? This allows the
systemddaemon to successfully bind the listening port9090to the requested IP address (172.17.0.1) even if the Docker network interface isn't up at boot time, gracefully managing the race condition.
- Implemented a
Traefik DNS Label Alignment (Open-WebUI):
- Updated the Traefik router rules mapped in the
open-webuicontainer labels to explicitly point toopenwebui-ls.cfscfs.com(matching the actual A-Record pointing to the VPS). - Added
extra_hosts: ["host.docker.internal:host-gateway"]to the main Traefik container. - Why? Traefik requires exact domain matching to negotiate ACME challenges with Let's Encrypt. The host gateway ensures Traefik can reliably hit native daemon ports (like Cockpit) bypassing Docker's virtual isolation.
- Updated the Traefik router rules mapped in the
Consequences
- Positive:
- The VPS architecture can survive hard reboots and complete Docker-prune cycles without dropping routing links.
- Cockpit reliably recovers automatically without manual
systemctl restart cockpit.socketinterventions. - Standardized Host-to-Docker reverse proxy routing using natively available domain endpoints.
- Negative:
FreeBindtheoretically masks legitimate binding failures in syslog, requiring deeper inspection if the underlying Docker network engine fails completely.- Maintaining Traefik labels inside the
docker-compose.ymlmandates manual synchronization with DNS provider records.