So today I discovered that there’s a cron job that holds non-reproducible state that died, and now our system is fucked.
The cron job doesn’t live inside any source control. This morning it entered a terminal state, and because it overwrites its state there’s no way to revert it.
I’m currently waiting for the database rollback and have rewritten it in a reproducible/idempotent way.
What’s extra frustrating is the previous guy did create a git repo of these types of hacks, but this one doesn’t live in it for no discernible reason.
Job security
He does charge a consulting fee to “fix” these issues
Almost all of them are dumb shit like this, where something is built in super hacky and dumbass ways.
It’s his kill switch and he forgot to check in.
Smart man. This is how we fight being replaced by AI.
Judgement day postponed indefinitely due to “Object reference not set to an instance of an object”
Super hacky and dumb? Sign me up 😂
Me running all my services in tmux
that might be a stupid question, but why would you running all services in tmux be a bad idea? a co-worker of mine is doing exactly that right now, which is why I’m asking.
- They’re all gone when you restart
- It doesn’t properly deal with logging
- You can’t set up dependencies between services but that doesn’t matter due to point 1
I recommend using systemd services and/or docker compose instead. systemd services are files that describe which program / script to run and when (like after networking is active or after a certain other service is loaded).
It’s not horrible, like it’ll do the job just fine, it’s just probably a better idea to use systemd and like, containers and whatnot, but I couldn’t be arsed to fiddle with all that for Jellyfin, caddy reverse proxy, and two modded Minecraft servers, so shell scripts and tmux won the day. It takes a little extra time to restart everything after an update, and maybe I’ll get the motivation to do things “correctly™” one day, but today is not that day.
Use the tmux resurrect plugin. It will restore your tmux session to its previous state after a restart, including programs if you like.
You can put off doing things “correctly™” even longer.
This is almost exactly what happened to me on Monday, resulting in a fifteen hour day.
My particular jenga piece was an Access query that none of my predecessors had deigned to document or even tell me about… but was critical to run monthly or you had obsolete data embedded deep within multi-million dollar reports.
Thank god I don’t work on salary anymore, or I’d be really upset.
I stopped reading at “Access” and just wept a silent tear for you.
So do you work for Spotify or Zoom?
Probably DeepSeek.
I have also mixed up
crontab -l
withcrontab -r
. 😔Let this be a lesson to start versioning your crontabs.
We never had our crons in source control, but I always saved it somewhere (usually on my machine and the target machine) so we had some history just in case of typing r instead of l for some reason. You can also create an alias called backupCrontab or something that runs the command for you and puts the output somewhere safe.
Cron job that evals some base64 encoded string which is actually downloading a script from a personal GitHub repo of an IT guy who left…
And just started cleaning up their GitHub account…
Only tangentially related, but “What a elegant house of cards” is an insult i’m going to use someday.
Ah yes, good old dependency.
Had a similar thing once. Some how, some way, the DBA copied and pasted something wrong. Oracle DB had some odd extra syntax for left and right joins that other DBs didn’t (or at least that I’d never seen). My best guess is that he auto formatted out of habit and maybe it took those symbols out.
It took a long time to find that. Because the only evidence something was wrong was that ONE of our customers wasn’t being billed for ONE product. Everyone else was fine. Basically they were using it in a very atypical way. The left joins made sure to include them in the billing even because they didn’t have whatever was on the right of that join. Everyone else did.
Time to restore a whole machine backup to a VM with no network connectivity, and manually pull the command?
I was able to do that
Turns out there was a second bug which triggered this one, and a bug I found in this script that I thought was responsible was happening silently for months.
Now three bugs are squashed
What’s a cron job?
Cron is a scheduler to run a program at a set frequency