hello everyone I have a small service I have been developing as a side gig for the last year and I want to self-host since I got some new hardware and I want to dive in the world of self-hosting and DevOps more (I’m a dev)
for now, I have it setup in a couple of EC2 instances, but things are getting expensive and since I got some new workstations from a friend that they were decommissioning (they are moving to a new place and have a bigger budget for a data center) I got 2 Lenovo Workstation each with Ryzen 9 5900X 32GB RAM and 1TB NVMe
I want to set up HA for this service and maybe add some other computer as a NAS, so my question is how should I go about doing this I was thinking proxmox on both nodes and setup HA, but I think that needs a common Data store (probably will be using the NAS for this) and how should I go about setting up the HTTP server (I have some stuff that also runs in docker containers I was thinking having one HTTP server for managing all that traffic with DNS and stuff) and monitoring any help is much appreciated
true in my current setup on EC2 i have two Postgres dbs one is just replicating, but I had an incident when a bug i wrote in my spring app eat all the available RAM and the VM got stuck, and I lost about 6 hours worth of user data , so that’s why I’m thinking maybe HA could help if a VM in one node is stuck or blocked or something of that kind the hypervisor will spin another one on a different node or am i wrong here
If you had monitoring, you wouldn’t have taken 6 hours to catch it.
I’d say learn HA anyway because it’s a good skill, but that doesn’t prevent you from having the other parts I mentioned. I say this because, again, unless you are experienced with HA, there will be edge cases where it’s not going to do what you though it would do, and your service will be down all the same. Monitoring/alerting and one-click/shell script install will be much more valuable in the short-mid term.
true that i did have uptimekuma for simple monitoring but the 6 hour down time was me, i was unreachable i was inside a factory doing some troubleshooting with no service and forgot to ask for wifi creds 🤦♂️(was stressed that day ) as for rollback and install i run those through github actions for CI/CD