TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A Friday Email Incident

30 pointsby Medeaalmost 2 years ago

7 comments

danpalmeralmost 2 years ago
A good writeup, but quite shocking that this managed to happen in the first place. I&#x27;d have expected that an email service provider would have very good monitoring on deliverability and failure reasons on both sending and receiving, and that something like a cloud migration would be done very incrementally to ensure no loss of service.<p>For this particular issue I would have expected some or all internal email at HEY! to be moved before any customers so that the new system could be tested.<p>Email is notoriously finicky when it comes to networks, IPs, the cryptography involved, and all sorts of details that are in flux during a cloud migration, and it&#x27;s also notorious for being difficult to recover from if you accidentally get your email listed in denylists.
dpcxalmost 2 years ago
I&#x27;m glad that they posted a &quot;miss&quot; - but this reads over and over like a sales pitch:<p>- I created a card in &lt;X&gt; Basecamp - Someone posted a message in Campfire - We have our own encryption - Another message posted in a different Campfire - Oh, this one uses custom categories! - Todo&#x27;s in Basecamp project<p>I get it, 37signals dogfoods their system. What we don&#x27;t normally see from other posts is that person&#x2F;company X posted in slack and made a ticket in jira and then created a todo on their trello board.<p>Maybe I&#x27;m being too cynical...
评论 #36142539 未加载
评论 #36141343 未加载
llm_nerdalmost 2 years ago
I&#x27;m a little surprised this was published. It is hard to sound charitable when writing something like this but it was such a trivial, obvious fault (moving an email system and then SPF starts failing) that normally things like this are embarrassingly swept under the rug. Generally that is probably the best path.<p>While I appreciate the transparency and it&#x27;s a great write-up, at the same time somehow I leave the post with a worse opinion of 37signals.
LeonMalmost 2 years ago
&gt; Senior SRE Paul Shuvashish first noticed that these emails weren’t failing DKIM but SPF. [...] This pointed out a flaw in our application-level analysis system: we were assimilating DMARC errors – which can be either because of SPF or DKIM – to DKIM errors. So while the app was doing the right thing nevertheless – marking the email as spam – the insight it was collecting internally was misleading.<p>I don&#x27;t agree with &#x27;the app was doing the right thing&#x27; here: for DMARC alignment (a DMARC pass) you need SPF <i>or</i> DKIM alignment. One of the two is enough.<p>So an email from a domain with DMARC enabled that passes DKIM, but fails SPF should pass. The application should not have rejected the email based on SPF, when it was actually DKIM aligned.
lijokalmost 2 years ago
Fixed an SPF issue by mucking around with SNAT rules. I think this is not the last time we&#x27;ll see HEY&#x27;s emails going to spam.
wordyskeletonalmost 2 years ago
As someone that works in a team with minimal collaboration software overhead—is there a ton of bloat in their process (Basecamp this, Campfire that, etc.) or is that just the reality of modern software development?
评论 #36140646 未加载
AJRFalmost 2 years ago
&gt; And this is not just a guideline; we built a new encryption technology<p>But...the old adage!