科技回声

4 条评论

davismwfl大约 7 年前

I have worked on very large datasets of different types and honestly true data loss was extremely rare for us. From my experience at least I'd say your question is missing two key components, the permanency of data loss and in-place versus a shadow (or copied) migration.We had many projects that we would do some data migration and find something missing, go back and have to reprocess, this is because we generally always used shadow migrations (not in place). Or sometimes it was a calculated value so we could recalculate it. But in terms of a true, oh crap we lost this data forever, super rare in my experience. It has happened, but if I had to put a guess to it I'd say it is a very small (low single digit) percentage of the time. This is all especially important when your migrations might take 1+ weeks to complete and you have to keep the business running etc.Migrations in place (e.g. replacing existing data during a migration) is exceedingly dangerous and more likely to result in data loss. Yet I have still seen companies do this to just try and save on costs, in those cases obviously I would say the likelihood goes up significantly.

xstartup大约 7 年前

Data loss in what sense? If you are using NoSQL, you'll have multiple replicas and data loss can often be reconstructed from Write Ahead Log. Even then due to network or memory corruption you might lose some data. But most startups who are using experimental database solution purely for performance/easy of use reasons often use Kafka and replay events when they lose data.Big companies like Google use replicated filesystem (GFS), Write Ahead Log to ensure migration/rebalancing are painless.

kureikain大约 7 年前

Usually all migration are prepared and test/exepriment a lot so the risks are reduce.I would say lot of my data loss are due to bug in code :(, sadly. We sometimes able to recover by using point in time recover.That's why nowsaday I opt for soft-delete and some kind of audit log so I can re-build the data.

jtchang大约 7 年前

Just a random thought but maybe you are more interested in data loss from the customer perspective? If a value is wrong it might be considered "loss"

4 条评论

davismwfl大约 7 年前

xstartup大约 7 年前

kureikain大约 7 年前

jtchang大约 7 年前

Just a random thought but maybe you are more interested in data loss from the customer perspective? If a value is wrong it might be considered "loss"

Ask HN: How common is data loss due to migration errors?

4 条评论

Ask HN: How common is data loss due to migration errors?

4 条评论