TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How do you make sure all your batch jobs ran?

1 pointsby deanebarkerover 2 years ago
I have a bunch of command line jobs that run overnight for various hobby projects. They&#x27;re not connected -- there&#x27;s about 8-9 that all run independently.<p>How do you keep track of these? Every once in a while, one of them throws a persistent error. I don&#x27;t realize it&#x27;s not running until I see something off about a week later, then I go through the logs and realize it hasn&#x27;t run since last week.<p>Is there a best practice to track command line jobs and make sure they ran when they were supposed to run?<p>I could rig something up, but I thought I&#x27;d check to see if this was a solved problem. And, again, these are hobby projects, so an enterprise solution is not what I&#x27;m looking for.

2 comments

linsomniacover 2 years ago
At work we use Icinga, and batch jobs submit passive results, if they fail or don&#x27;t check in within a reasonable time, it triggers an alert. Probably more than you want to set up, but might give you some ideas. We have a wrapper script that sends the job status and output.<p>Alternatively, you could make the jobs quiet on success and noisy on failure, and just have cron mail you about them? Or make a wrapper script that saves off the status and a status job that mails you if any failed?
criticasover 2 years ago
If the batch jobs have few or simple interdependencies, scheduling is the easy part. Are your tasks too complex for cron&#x2F;at&#x2F;batch? For example, do they require coordination across machines? That might suggest looking at slurm&#x2F;lsf or another distributed job scheduler or implementing them on Kubernetes. Sounds like that would be overkill in your case.<p>It doesn&#x27;t sound like a scheduling problem - it sounds like a noticing problem. You have to figure out what to do on failure - email, text, retry, log, etc. (Hence the suggestion for Kubernetes, or another declarative automation system like Ansible or Puppet). If &quot;daemon X should be running&quot;, checking for it and sending an email is the easiest and most useless response.