Ask HN: How do you make sure all your batch jobs ran?

1 点作者 deanebarker超过 2 年前

I have a bunch of command line jobs that run overnight for various hobby projects. They're not connected -- there's about 8-9 that all run independently.How do you keep track of these? Every once in a while, one of them throws a persistent error. I don't realize it's not running until I see something off about a week later, then I go through the logs and realize it hasn't run since last week.Is there a best practice to track command line jobs and make sure they ran when they were supposed to run?I could rig something up, but I thought I'd check to see if this was a solved problem. And, again, these are hobby projects, so an enterprise solution is not what I'm looking for.

2 条评论

linsomniac超过 2 年前

At work we use Icinga, and batch jobs submit passive results, if they fail or don't check in within a reasonable time, it triggers an alert. Probably more than you want to set up, but might give you some ideas. We have a wrapper script that sends the job status and output.Alternatively, you could make the jobs quiet on success and noisy on failure, and just have cron mail you about them? Or make a wrapper script that saves off the status and a status job that mails you if any failed?

criticas超过 2 年前

If the batch jobs have few or simple interdependencies, scheduling is the easy part. Are your tasks too complex for cron/at/batch? For example, do they require coordination across machines? That might suggest looking at slurm/lsf or another distributed job scheduler or implementing them on Kubernetes. Sounds like that would be overkill in your case.It doesn't sound like a scheduling problem - it sounds like a noticing problem. You have to figure out what to do on failure - email, text, retry, log, etc. (Hence the suggestion for Kubernetes, or another declarative automation system like Ansible or Puppet). If "daemon X should be running", checking for it and sending an email is the easiest and most useless response.

Ask HN: How do you make sure all your batch jobs ran?

1 点作者 deanebarker超过 2 年前

2 条评论

linsomniac超过 2 年前

criticas超过 2 年前