TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Tarsnap performance issues in late March, most of April

172 pointsby pndmnmabout 10 years ago

9 comments

mtsmith85about 10 years ago
This line: <i>I would have sent out an email to the mailing lists earlier; but since at each point I thought I was &quot;one change away&quot; from fixing the problems, I kept on delaying said email until it was clear that the problems were finally fixed&quot; </i> is such a common situation for most people, but I tend to see it with engineers especially. I find I struggle with it an incredible amount. In some ways, I guess it seems healthy or reassuring that incredibly smart people like Colin Percival suffer from similar challenges around fully understanding the scope of the problem and the solution.<p>All that being said, I really respect the detailed response from a technical perspective as well as owning up to (and the decisions that went into) a spell of downgraded performance.<p>Later edit because I don&#x27;t want to spam the comments: I&#x27;d love some context (maybe from cperciva himself?) around the performance enhancement of integrating new Intel AESNI instructions. This is well beyond my depth and while Colin mentions that it didn&#x27;t necessarily increase performance, I&#x27;m wondering if the hope is it would longterm? Or were there other benefits to such an integration?
评论 #9496348 未加载
评论 #9496308 未加载
评论 #9498273 未加载
评论 #9496290 未加载
patio11about 10 years ago
In case any other customer is wondering &quot;Wait, I didn&#x27;t hear anything from my monitoring about that and I&#x27;m retroactively worried. How worried should I be?&quot; like I was: I just pulled our logs and reconstructed them, and it shows over the last ~30 days that the worse-case performance of our daily backup (~150 MB per day delta, ~45 GB total post deduplication) was about 40% longer than our typical case. This didn&#x27;t trip our monitoring at the time because they all completed successfully.<p>n.b. Our backups run outside of the hotspot times for Tarsnap, so we may have had less performance impact than many customers. I have an old habit of &quot;Schedule all cron jobs to start predictably but at a random offset from the hour to avoid stampeding any previously undiscovered SPOFs.&quot; That&#x27;s one of the Old Wizened Graybeard habits that I picked up from one of the senior engineers at my last real job, which I impart onto y&#x27;all for the same reason he imparted it onto me: it costs you nothing and <i>will</i> save you grief some day far in the future.
评论 #9497562 未加载
评论 #9496318 未加载
评论 #9496212 未加载
评论 #9496215 未加载
评论 #9496479 未加载
评论 #9496487 未加载
cpercivaabout 10 years ago
I suppose I should have known that this would end up at the top of Hacker News...
评论 #9496439 未加载
评论 #9496665 未加载
Osirisabout 10 years ago
For those that want to run a similar service using their own systems, I found that Attic [1] is a great open source backup tool that works in a very similar way, including deduplication and compression.<p>I backup some VPS servers to my NAS at home using attic over an SSH tunnel. Incremental backups are quite small and it&#x27;s easy to automate with a simple cron job.<p>[1] <a href="https:&#x2F;&#x2F;attic-backup.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;attic-backup.org&#x2F;</a>
评论 #9497317 未加载
k1w1about 10 years ago
As an AWS user this type of thing gives me cause for concern:<p><i>At 2015-04-01 00:00 UTC, the Amazon EC2 &quot;provisioned I&#x2F;O&quot; volume on which most of this metadata was stored suddenly changed from an average latency of 1.2 ms per request to an average latency of 2.2 ms per request. I have no idea why this happened -- indeed, I was so surprised by it that I didn&#x27;t believe Amazon&#x27;s monitoring systems at first -- but this immediately resulted in the service being I&#x2F;O limited.</i><p>A sudden doubling of latency can have dire consequences on any system. Knowing that such unexpected changes are possible makes it built trust in your environment, even if it is running fine today.
评论 #9496592 未加载
评论 #9496532 未加载
ac29about 10 years ago
Sorry if this is offtopic, but can anybody explain the value proposition of tarsnap to me? It seems like a nice service and all, but the pricing is an order of magnitude more expensive than S3. If you are storing a few GB, this might not matter (&quot;over half of Tarsnap users spend under $1 per month on storing their backups&quot;), but if you have that little data, why not just dump it on a free Dropbox&#x2F;Gdrive&#x2F;etc account?<p>For more data, why not just use one of the many compressed, deduplicated, encrypted, incremental backup systems (attic comes to mind, I&#x27;m sure there are others) then just sync to S3 at a tenth the cost?
评论 #9496394 未加载
评论 #9502196 未加载
评论 #9496525 未加载
appsonifyabout 10 years ago
what the fu....Colin Percival used to be my cello teacher 12 years ago....and he is running tarsnap. My mind is blown.
评论 #9496324 未加载
btmorexabout 10 years ago
Why are you reinventing a scheduler when the OS (at least Linux) already provides a good one?
评论 #9496921 未加载
Someoneabout 10 years ago
Good description, but I&#x27;m missing lesson learned #0: Do not wait too long before informing your users, even if only to tell them &quot;we know about it and are working on it&quot;