科技回声

13 条评论

I was never a fan of the typical SAN topology, ever since I read Joyent's responses to one of the big early EBS outages in 2011 or 2012. Plus of course, as the article points out, local storage is faster. But Joyent never actually pulled off anything like what Fly has done for migrating volumes between hosts. Congrats on solving the migration problem while maintaining what's good about local storage.

zokier10 个月前

I wonder what the io perf will look like during migration. Gut feeling is that going through dm-clone/iscsi/wireguard would be lot slower than direct local nvme.

评论 #41112899 未加载

mattbee10 个月前

I designed a similar system 10 years ago at Bytemark which worked for a few thousand VMs, ran for about 12 years. It called BigV [1]. It might still be running (any customers here still?). I think the new owners tried to shut it down but customers kept protesting when offered a less-featureful platform :-)The two architectural differences from fly:* the VM clusters were split into "head" and "tail" machines & linked on a dedicated 10Gbps LAN. So each customer VM needed its corresponding head & tail machine to be alive in order to run, but qemu could do all that natively;* we built our own network storage layer based on NBD called flexnbd [2]. It served local discs to the heads, managed access control and so on. It could also be put into a "mirror" mode where a VM's disc would start writing its blocks out to another server while continuing to serve, keeping track of "dirty" blocks etc. exactly as described here.It was very handy to be able to sell and directly attach discs of different performance characteristics without having to migrate machines. But I suspect the network (even at 10Gbps) was too much of a bottleneck.I can't remember whether Linux supported the kind of fancy disc migration we wanted to do back in 2011. If it did, it was hard enough that spending a year getting our own server right seemed worth it.It is particularly sweet trick to have a suspicion about a server and just say "flush it!" and in 12-24 hours, it's no longer in service. We had tools that most of our support team could use to execute on a slight suspicion. You do notice a performance dip while migrations are going on, but the decision to use network storage (and reduce it overall lol) might have masked that.Having our discs served from userspace reduced the administration that we needed to do. But it comes with terror of maintaining a piece of C that shuffled our customers data around. Also - because I was a masochist - customers discs were files stored on btrfs and we became reluctant experts. Overall the system was reliable but it took a good 12-18 months of customers tolerating fscks (& us being careful not to anger the filesystem).I did miss this kind of work in 2022 and interviewed for a support role at fly. I'm not sure how to take being rejected at the screener stage, I'm sure some of my former staff might be able to explain it :)[1] <a href="https://blog.bytemark.co.uk/wp-content/uploads/2012/04/DesignAndImplementationOfBigV.pdf" rel="nofollow">https://blog.bytemark.co.uk/wp-content/uploads/2012/04/Desig...</a>[2] <a href="https://github.com/BytemarkHosting/flexnbd-c">https://github.com/BytemarkHosting/flexnbd-c</a>

评论 #41111158 未加载

评论 #41113300 未加载

评论 #41111603 未加载

fridder10 个月前

My initial thought was that ZFS replication would be excellent for this but I guess it is not low level enough?

评论 #41114117 未加载

siliconc0w10 个月前

It'd be nice if Fly offered a highly available disk. I know you can move HA into the database layer but that is a lot of complexity for their target audience. If you can build all this machinery, you can also probably manage running DRBD.

评论 #41111119 未加载

schmichael10 个月前

Why iSCSI instead of NVMEoF?

评论 #41112992 未加载

setheron10 个月前

I remember in a past life using DRBD to keep block devices in sync (at RDS AWS).Is this functionality effectively in-kernel now ?

nik73610 个月前

Am I the only one seeing a lot of advantages with local storage? I don't think it's idiosyncratic at all – that's how DigitalOcean became the company it is today, with simple local storage VMs.The performance of local NVMes is way better, it is more predictable and you don't have to factor in latency, bandwidth, network issues, bottlenecks and more. Redundancy can be achieved with a multi host setup, so even if the host fails the underlying application or database is not impacted.The one disadvantage I can see is that you can't "scale"/change the underlying disks. The disks you bought when setting up the server are probably there to stay.

评论 #41110075 未加载

评论 #41110211 未加载

评论 #41111098 未加载

评论 #41112780 未加载

评论 #41112770 未加载

beedeebeedee10 个月前

I can't speak to the technical aspects of this product, but makes me think of the movie Ghost in the Machine, and all the horror aspects of AI processes moving between hardware around the world :)

评论 #41112795 未加载

dangoodmanUT10 个月前

> When your problem domain is hard, anything you build whose design you can’t fit completely in your head is going to be a fiasco. Shorter form: “if you see Raft consensus in a design, we’ve done something wrong”.This bothers me a bit. I get what they are saying, but it feels like this assumes they are implementing Raft too. Packages like Dragonboat make it so you don't have to think about Raft, only whether you are the leader or not.

评论 #41112933 未加载

评论 #41112791 未加载

评论 #41112777 未加载

rohitpaulk10 个月前

Great content! Sidenote for the Fly team: on mobile, the “sidenote” cards appear in the wrong order - they appear before the content instead of after.

评论 #41112634 未加载

评论 #41112667 未加载

评论 #41112798 未加载

neom10 个月前

I really love that fly are calling themselves a cloud provider now!!!!!!!!I've advised a few startups over the years who were trying to take a stab at "developer focused cloud" and for whatever reason they felt shy to say that, and frankly, I think it's the reason they're no longer around. Fly are bold and I really enjoy how they show the infra side of the engineering.Handling stateful application migrations with asynchronous data hydration and block-level cloning, A+++ - I've been thinking a lot recently about how if I was ever to build a cloud provider again, I think first focusing on an "intelligent" (read "AI" driven) orchestration system - this would be good generally, but especially around things like global data compliance and sovereignty, I can imagine some pretty sweet features.

评论 #41112782 未加载

评论 #41112774 未加载

nolist_policy10 个月前

Qemu can do shared-nothing live migration since a long time.

评论 #41112646 未加载

13 条评论

mwcampbell10 个月前

zokier10 个月前

I wonder what the io perf will look like during migration. Gut feeling is that going through dm-clone/iscsi/wireguard would be lot slower than direct local nvme.

评论 #41112899 未加载

mattbee10 个月前

评论 #41111158 未加载

评论 #41113300 未加载

评论 #41111603 未加载

fridder10 个月前

My initial thought was that ZFS replication would be excellent for this but I guess it is not low level enough?

评论 #41114117 未加载

siliconc0w10 个月前

评论 #41111119 未加载

schmichael10 个月前

Why iSCSI instead of NVMEoF?

评论 #41112992 未加载

setheron10 个月前

I remember in a past life using DRBD to keep block devices in sync (at RDS AWS).Is this functionality effectively in-kernel now ?

nik73610 个月前

评论 #41110075 未加载

评论 #41110211 未加载

评论 #41111098 未加载

评论 #41112780 未加载

评论 #41112770 未加载

beedeebeedee10 个月前

I can't speak to the technical aspects of this product, but makes me think of the movie Ghost in the Machine, and all the horror aspects of AI processes moving between hardware around the world :)

评论 #41112795 未加载

dangoodmanUT10 个月前

评论 #41112933 未加载

评论 #41112791 未加载

评论 #41112777 未加载

rohitpaulk10 个月前

Great content! Sidenote for the Fly team: on mobile, the “sidenote” cards appear in the wrong order - they appear before the content instead of after.

Making Machines Move

13 条评论

Making Machines Move

13 条评论