TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How do you make sure all your batch jobs ran?

1 点作者 deanebarker超过 2 年前
I have a bunch of command line jobs that run overnight for various hobby projects. They&#x27;re not connected -- there&#x27;s about 8-9 that all run independently.<p>How do you keep track of these? Every once in a while, one of them throws a persistent error. I don&#x27;t realize it&#x27;s not running until I see something off about a week later, then I go through the logs and realize it hasn&#x27;t run since last week.<p>Is there a best practice to track command line jobs and make sure they ran when they were supposed to run?<p>I could rig something up, but I thought I&#x27;d check to see if this was a solved problem. And, again, these are hobby projects, so an enterprise solution is not what I&#x27;m looking for.

2 条评论

linsomniac超过 2 年前
At work we use Icinga, and batch jobs submit passive results, if they fail or don&#x27;t check in within a reasonable time, it triggers an alert. Probably more than you want to set up, but might give you some ideas. We have a wrapper script that sends the job status and output.<p>Alternatively, you could make the jobs quiet on success and noisy on failure, and just have cron mail you about them? Or make a wrapper script that saves off the status and a status job that mails you if any failed?
criticas超过 2 年前
If the batch jobs have few or simple interdependencies, scheduling is the easy part. Are your tasks too complex for cron&#x2F;at&#x2F;batch? For example, do they require coordination across machines? That might suggest looking at slurm&#x2F;lsf or another distributed job scheduler or implementing them on Kubernetes. Sounds like that would be overkill in your case.<p>It doesn&#x27;t sound like a scheduling problem - it sounds like a noticing problem. You have to figure out what to do on failure - email, text, retry, log, etc. (Hence the suggestion for Kubernetes, or another declarative automation system like Ansible or Puppet). If &quot;daemon X should be running&quot;, checking for it and sending an email is the easiest and most useless response.