Proxmox - I just got in trouble
Claude AI just gave me a boot in the backside.
Apparently I am trying to do things to fast.
Why I was doing the daughters laundry, I notices that some of my services had gone offline. Offline on the internal LAN and offline on the WAN (Internet)
The hostname changes if you are inside the network or outside, but you know that right.
No they're not, they are up and the logs are clean.
Problem - the short of it is I am starting the proxy server to fast after the application server.
- Yes the proxy is its own LXC - it contains the proxy and the cloudflare tunnel
- The application LXC is responsible for servicing applications after passing development ( where I break stuff - sometimes deliberately).
I was not giving the application server time to start properly and my nftable rules where then ditching any new attempt by the proxy to reconnect to the applications on the server.
Since I only rebooted baden and are now back in the green, that tells me that the nftable rules need reviewing in the proxy and application server - yeah the nftables rules are slightly different between the server, the proxy and the application server - the latter two being obvious.
Claude's response
Yes, almost certainly. Here's what happened:
The sequence of events at boot:
- Proxmox starts → waits 15s → starts baden
- Baden's LXC boots, systemd starts, Podman begins launching containers
- 5 seconds later → alex starts
- Alex's Zoraxy comes up, tries to reach your applications etc...
- Baden is still spinning up ~24 containers (Immich alone with ML, postgres, redis takes a while)
- Zoraxy marks them offline and the
ct state established,relatedfirewall rule means subsequent connection attempts may be getting dropped or mishandled
The 20 second delay on alex is not enough — you're giving baden only 5 seconds head start, and that's nowhere near enough time to bring up 25+ containers including heavy ones like Immich ML, Jellyfin, Ollama, and authentik.
Nobody ever discusses this and I have watched mountains of youtube videos and followed guides by people who do this for a living unlike me who drives mining/civil equipment at the moment (it pays better than information technology - what can I say).
I have pulled a lot of code off this site lately - because I found bugs or issues that I was not aware of at the time the code was published. It includes.
- The GPU LXC code for nvidia
- The Proxmox LXC setup code
- The Zoraxy forensic code
- Now I may have unintentionally over hardened SSH.
All of this code has been rectified and rebuilt. But damn if I will publish it yet till I get a few kernel upgrades and see how everyting responds.
I have switched to quadlets on Podman for portainer and the agent, I have with Claude AI totally rebuilt the GPU LXC setup code (that was a nightmare), along with tweaking and running upgrades on containers deployed via Portainer (that was an upgrade in itself requiring another rewrite).
This posted little problem was due to starting the proxy and the associated tunnel too quickly, and a what I though was a responsible nftable policy (one that I am still not happy with). At least it was simple to fix -reboot. I wish every thing was that easy.
So much to learn.
#enoughsaid