TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Warn HN: Resize of Digital Ocean instance corrupted filesystem beyond repair

17 点作者 oskarpearson大约 10 年前
Careful with resizing on Digital Ocean...<p>A recent resize of a Digital Ocean instance corrupted the filesystem beyond repair. Support have offered me a $80 credit and offered to restore from a backup.<p>You should probably snapshot your instances before resizing, or migrate to a new instance manually.<p>(Thankfully I&#x27;ve built a replacement box from Ansible in the meantime.)<p>The upgrade process appears to have involved copying to different hardware across the network, since the resize operation took over 15 minutes. That copy seems to be incomplete, which led to a completely unrecoverable filesystem.<p>On coming back up, the system displayed &quot;DOROOT does not exist after resize&quot;. Running a filesystem check scrolled tens of thousands of e2fsck &quot;fixes&quot; for a period of 12 minutes.<p>As expected, the end result of that is that all &quot;files&quot; on the filesystem were in &#x2F;mnt&#x2F;lost+found with random names, and the data in them no doubt corrupt too.<p>Digital Ocean support does not appear to be able to re-copy or review the previous block device to determine the source of the problem. They also don&#x27;t appear to have logs of the resize operation.<p>Sure - it&#x27;s always possible the filesystem was irretrievably corrupted before reboot - but I think it&#x27;s pretty unlikely. Given that I&#x27;ve not been doing things like &#x27;dd if=&#x2F;dev&#x2F;urandom of=&#x2F;dev&#x2F;vda1&#x27; on there, that would probably indicate a hardware fault on their side anyway.<p>It&#x27;s worth noting that I rebooted the box successfully a few minutes before the resize, so a (journaled) e2fsck ran at that point. The filesystem was at least useable a few minutes before the resize.<p>(Ticket #633210 in case anyone from Digital Ocean wants to investigate.)

1 comment

cat9大约 10 年前
Nice of them to give you a credit, and a good decision from a customer service standpoint, but I doubt they&#x27;re at fault in any real way. It could be any number of things, many of which are completely out of their control to do more than mitigate and minimize, and thus part of working with real computing systems at scale.<p>Cultivate healthy paranoia that systems will fail - because eventually, they will, particularly if you run 100 of them or run them for several years or any other &quot;you have to survive 1000 coin tosses to miss the error&quot; combinatoric series. And always make a backup before doing system-changing events like resizing a partition or reprovisioning a VM.