TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What is the error alerting stack at your startup?

6 点作者 aiunboxed超过 1 年前
If you are working at a &lt; 30 people engineering team company what is the alerting stack that you guys are using ?<p>We tend to miss a lit of critical alerts that come to simply because the alerting is not set up properly.

8 条评论

slap_shot超过 1 年前
I&#x27;m surprised how often I speak to technical teams that do not utilize PagerDuty (or an equivalent alternative). As PagerDuty integrates with nearly any external system, it separates the collection of telemetry from the incident response lifecycle, i.e. what is wrong? who should be or is looking into this? what did we learn from this? how often is this happening?<p>Personally, I find notifications in Slack to be an anti-pattern: a lot of teams expect someone to just &quot;pick up&quot; the incident based on their availability or expertise and _maybe_ the resolution is documented. Assigning direct responsibility by component and on-call schedule appending the RCA reduces the time-to-resolution and overall toil of the process.
nip超过 1 年前
Custom built monitoring on top of CloudWatch logs: we subscribe to the log groups and parse the logs.<p>Errors are reported in dedicated slack channels<p>The “MVP” was built in 1 week after we were faced with an outrageous bill from an observability vendor and decided to give a shot at implementing it ourselves.<p>In total I’d say that we invested 2 additional weeks of man-hour to get to where we are today.<p>It has worked extremely well for us and has needed little maintenance (granted we pay AWS to not have to do that maintenance)
mtmail超过 1 年前
StatusCake has a feature to call me. It&#x27;s a horrible artificial voice &quot;your website $name is down&quot; but I&#x27;m fine with anybody shouting at me at 3am. The phone number is from the United States and I don&#x27;t need to add it to my phone book because that&#x27;s the only US phone calling me. (For people inside the US you might think it&#x27;s another robocall)
guybedo超过 1 年前
i keep it simple with an uptime monitoring service that monitors all the elements of my stack and run tests every minute:<p>- regular http monitoring for websites<p>- run test queries on my sql &amp; mongo databases<p>- check that rabbitmq queues are not overflowing<p>- check that docker container are up<p>If something goes wrong, email &amp; telegram alerts.<p>fwiw i&#x27;m using <a href="https:&#x2F;&#x2F;uptimefunk.com" rel="nofollow noreferrer">https:&#x2F;&#x2F;uptimefunk.com</a>
rozenmd超过 1 年前
Uptime monitoring + cron job monitoring via OnlineOrNot (dogfooding my own product), with alerts going to PagerDuty (set up to email -&gt; SMS -&gt; call me if I don&#x27;t acknowledge), and a &quot;public&quot; alert in a Slack channel.
girishso超过 1 年前
Nothing fancy, Alerts are posted in a slack channel.
Cicero22超过 1 年前
We have someone check grafana a few times a day and alert us if there&#x27;s an issue. Not great, but it works
0xebo超过 1 年前
webhooks to slack