LiteFS

648 pointsby danielskoglyover 2 years ago

22 comments

no_wizardover 2 years ago

This is distributed SQLite 3, running (I assume at least partially managed?) LiteFS[5] for you. Which is pretty cool!What I'd like to have seen is how this compares to things like rqlite[1] or Cloudflare's D1[2] addressed directly in the articleThat said, I think this is pretty good for things like read replica's. I know the sales pitch here is as a full database, and I don't disagree with it, and if I was starting from scratch today and could use this, I totally would give it a try and benchmark / test accordingly, however I can't speak to that use case directly.What I find however and what I can speak to, is that most workloads already have database of some kind setup, typically not SQLite as their main database (MySQL or PostgreSQL seem most common). This is a great way to make very - insanely, really - fast read replica's across regions of your data. You can use an independent raft[3][4] implementation to do this on write. If your database supports it, you can even trigger a replication directly from a write to the database itself (I think Aurora has this ability, and I think - don't quote me! - PostgreSQL can do this natively via an extension to kick off a background job)To that point, in my experience one thing SQLite is actually really good at is storing JSON blobs. I have successfully used it for replicating JSON representations of read only data in the past to great success, cutting down on read times significantly for APIs as the data is "pre-baked" and the lightweight nature of SQLite allows you to - if you wanted to naively do this - just spawn a new database for each customer and transform their data accordingly ahead of time. Its like AOT compilation for your data.if you want to avoid some complexity with sharding (you can't always avoid it outright, but this can help cap its complexity) this approach helps enormously in my experience. Do try before you buy!EDIT: Looks like its running LiteFS[5] not LiteStream[0]. This is my error of understanding.[0]: <a href="https://litestream.io/" rel="nofollow">https://litestream.io/</a>[1]: <a href="https://github.com/rqlite/rqlite" rel="nofollow">https://github.com/rqlite/rqlite</a>[2]: <a href="https://blog.cloudflare.com/introducing-d1/" rel="nofollow">https://blog.cloudflare.com/introducing-d1/</a>[3]: <a href="https://raft.github.io/" rel="nofollow">https://raft.github.io/</a>[4]: <a href="https://raft.github.io/#implementations" rel="nofollow">https://raft.github.io/#implementations</a>[5]: <a href="https://github.com/superfly/litefs" rel="nofollow">https://github.com/superfly/litefs</a>

评论 #32927622 未加载

评论 #32927575 未加载

评论 #32927635 未加载

lijogdfljkover 2 years ago

This is really cool! Unfortunately i primarily am interested in offline databases so perhaps i'm just not the target audience. However i have to ask, on that note, does this have any application in the offline space?Ie i wonder if there's a way to can write your applications such that they have less/minimal contention, and then allow the databases to merge when back online? Of course, what happens when there inevitably _is_ contention? etcNot sure that idea would have a benefit over many SQLite DBs with userland schemas mirroring CRDT principles though. But a boy can dream.Regardless, very cool work being done here.

评论 #32927738 未加载

评论 #32929346 未加载

pphyschover 2 years ago

> Developing against a relational database requires devs to watch out for "N+1" query patterns, where a query leads to a loop that leads to more queries. N+1 queries against Postgres and MySQL can be lethal to performance. Not so much for SQLite.This is misleading AFAICT. The article(s) is actually comparing remote RDBMS to local RDBMS, not Postgres to SQLite.Postgres can also be served over a UNIX socket, removing the individual query overhead due to TCP roundtrip.SQLite is a great technology, but keep in mind that you can also deploy Postgres right next to your app as well. If your app is something like a company backend that could evolve a lot and benefit from Postgres's advanced features, this may be the right choice.

评论 #32928190 未加载

评论 #32928019 未加载

评论 #32929093 未加载

评论 #32928111 未加载

评论 #32929554 未加载

评论 #32929745 未加载

评论 #32929574 未加载

vcryanover 2 years ago

This approach is very appealing to me :) curious about how people handle schema migrations when using this approach.I segment sqlite files (databases) that have the same schema into the same folder. I haven't really had a case where migrations was really a concern, but I could see it happening soon.Seems like in my deployment, I'm going to need an approach to loop over dbs to apply this change... I currently have a step of app deployment that attempts to apply migrations... but it is more simplistic because the primary RDBMS (postgresql) just appears to the application as a single entity which is the normative use-case for db-migrate-runners.

评论 #32929639 未加载

infogulchover 2 years ago

> To improve latency, we're aiming at a scale-out model that works similarly to Fly Postgres. That's to say: writes get forwarded to the primary and all read requests get served from their local copies.How can you ensure that a client that just performed a forwarded write will be able to read that back on their local replica on subsequent reads?

评论 #32929946 未加载

评论 #32929923 未加载

asimover 2 years ago

10 years ago fly.io is the company I wanted to build. Something with massive technical depth that becomes a developer product. They're doing an incredible job and part of that comes down to how they evangelise the product outside of all the technical hackery. This requires so much continued effort. AND THEN to actually run a business on top of all that. Kudos to you guys. I struggled so much with this. Wish you nothing but continued success.

评论 #32928891 未加载

mwcampbellover 2 years ago

I wonder if using FUSE has had any appreciable impact on performance, particularly read performance. I ask because FUSE has historically had a reputation for being slow, e.g. with the old FUSE port of ZFS.

评论 #32929110 未加载

评论 #32940948 未加载

Existenceblinksover 2 years ago

The pain is that this approach is suitable on VPS/IaaS where disk volume is supported. As a solo dev, I only use PaaS kind of infra, there are just a few PaaS i'm aware of that support attachable disk. Fly, Render, .. nothing else?

nicoburnsover 2 years ago

Where is the data actually being stored in this setup? A copy on each machine running the application? If so, is there another copy somewhere else (e.g. S3) in case all nodes go down?Also, what happens if the Consul instance goes down?If my application nodes can't be ephemeral then this seems like it would be harder to operate than Postgres or MySQL in practice. If it completely abstracts that away somehow then I suppose that'd be pretty cool.Currently finding it hard to get on board with the idea that adding a distributed system here actually makes things simpler.

评论 #32928379 未加载

hinkleyover 2 years ago

> Second, your application can only serve requests from that one server. If you fired up your server in Dallas then that'll be snappy for Texans. But your users in Chennai will be cursing your sluggish response times since there's a 250ms ping time between Texas & India.> To improve availability, it uses leases to determine the primary node in your cluster. By default, it uses Hashicorp's Consul.Having a satellite office become leader of a cluster is one of the classic blunders in distributed computing.There are variants of Raft where you can have quorum members that won't nominate themselves for election, but out of the box this is a bad plan.If you have a Dallas, Chennai, Chicago, and Cleveland office and Dallas goes dark (ie, the tunnel gets fucked up for the fifth time this year), you want Chicago to become the leader, Cleveland if you're desperate. But if Chennai gets elected then everyone has a bad time, including Dallas when it comes back online.

评论 #32933064 未加载

clordover 2 years ago

I can imagine a database which tries to solve both of these domains.A centralized database handles consistency, and vends data closures to distributed applications for in-process querying (and those closures reconcile via something like CRDT back to the core db).Does this exist?

评论 #32929323 未加载

azlyricsover 2 years ago

I actually gave fly.io a whirl over the weekend. was not fun. spent a lot of time on the forums and its pretty clear it has some way to go before it can give AWS or Linode a run for their money.For instance, we run kubernetes on multiple VPS providers, including public clouds with serverless onramp/offramps deployed on edge location. Anything under 15 minutes are processed by serverless. Anything longer is offloaded to one of the VPS containers available in every part of the world.I have some more feedbacks ready if you are interested, its a neat idea but not exactly as seamless and easy as the idea proposed since public clouds already offer a way to do this.

vcryanover 2 years ago

Currently, I am running multiple application servers (and lambda functions) using AWS Fargate that access sqlite files (databases) on an EFS share. So far so good, although my use cases are fairly simple.

hobo_markover 2 years ago

Well that was fast... [1]Are the readable replicas supposed to be long-lived (as in, I don't know, hours)? Or does consul happily converge even with ephemeral instances coming and going every few minutes (thinking of something like Cloud Run and the like, not sure if Fly works the same way)? And do they need to make a copy of the entire DB when they "boot" or do they stream pages in on demand?[1] <a href="https://news.ycombinator.com/item?id=32240230" rel="nofollow">https://news.ycombinator.com/item?id=32240230</a>

评论 #32928117 未加载

jensneuseover 2 years ago

Sound like a drop in solution to add high availability to WunderBase (<a href="https://github.com/wundergraph/wunderbase" rel="nofollow">https://github.com/wundergraph/wunderbase</a>). Can we combine LiteFS with Litestream for Backups, or how would you do HA + Backups together?

评论 #32931205 未加载

endisneighover 2 years ago

Seems neat, until you try to do schema migrations. Unless they can guarantee that all containers’ SQLite instances have the same scheme without locking I’m not sure how doesn’t run into the same issues as many NoSQL.CouchDB had this same issue with its database per user model and eventually consistent writes.

评论 #32930312 未加载

评论 #32941026 未加载

whoisjuanover 2 years ago

Unrelated, but why does the map on their homepage show a region in Cuba? That must be wrong.

评论 #32928218 未加载

ranger_dangerover 2 years ago

ELI5?

评论 #32928030 未加载

评论 #32928069 未加载

theomegaover 2 years ago

Does anyone else bump into the issue, that the fly.io website does not load if requested via IPv6 on Mac? I tried Safari, Chrome and curl and neither work:<pre><code> $ curl -v https://fly.io/blog/introducing-litefs/ * Trying 2a09:8280:1::a:791:443... * Connected to fly.io (2a09:8280:1::a:791) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/cert.pem * CApath: none * (304) (OUT), TLS handshake, Client hello (1): curl: (35) error:02FFF036:system library:func(4095):Connection reset by peer </code></pre> Requesting via ipv4 works<pre><code> $ curl -4v https://fly.io/blog/introducing-litefs/ * Trying 37.16.18.81:443... * Connected to fly.io (37.16.18.81) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/cert.pem * CApath: none * (304) (OUT), TLS handshake, Client hello (1): * (304) (IN), TLS handshake, Server hello (2): * (304) (IN), TLS handshake, Unknown (8): * (304) (IN), TLS handshake, Certificate (11): * (304) (IN), TLS handshake, CERT verify (15): * (304) (IN), TLS handshake, Finished (20): * (304) (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256 * ALPN, server accepted to use h2 * Server certificate: * subject: CN=fly.io * start date: Jul 25 11:20:01 2022 GMT * expire date: Oct 23 11:20:00 2022 GMT * subjectAltName: host "fly.io" matched cert's "fly.io" * issuer: C=US; O=Let's Encrypt; CN=R3 * SSL certificate verify ok. * Using HTTP2, server supports multiplexing * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * Using Stream ID: 1 (easy handle 0x135011c00) > GET /blog/introducing-litefs/ HTTP/2 > Host: fly.io > user-agent: curl/7.79.1 > accept: */* > * Connection state changed (MAX_CONCURRENT_STREAMS == 32)! < HTTP/2 200 < accept-ranges: bytes < cache-control: max-age=0, private, must-revalidate < content-type: text/html < date: Wed, 21 Sep 2022 16:50:16 GMT < etag: "632b20f0-1bdc1" < fly-request-id: 01GDGFA3RPZPRDV9M3AQ3159ZK-fra < last-modified: Wed, 21 Sep 2022 14:34:24 GMT < server: Fly/51ee4ef9 (2022-09-20) < via: 1.1 fly.io, 2 fly.io < <!doctype html> ...</code></pre>

评论 #32928148 未加载

评论 #32929982 未加载

评论 #32928083 未加载

morenoh149over 2 years ago

still wondering why projects like <a href="https://www.reactivated.io/" rel="nofollow">https://www.reactivated.io/</a> are using fly.io

presentationover 2 years ago

My dream would be if this supported geo-partitioning. In my field people are pretty sensitive about GDPR so would love to box in EU PII in EU servers.

评论 #32941068 未加载

fnyover 2 years ago

This reads like a professor who's so steeped in research that he's forgotten how to communicate to his students!What exactly are we talking about here? A WebSQL thats actually synced to a proper RDBMS? Synced across devices? I'm not clear about an end to end use case.Edit: Honestly, this line from the LiteFS docs[0] needs to be added to the top of the article:> LiteFS is a distributed file system that transparently replicates SQLite databases. This lets you run your application like it's running against a local on-disk SQLite database but behind the scenes the database is replicated to all the nodes in your cluster. This lets you run your database right next to your application on the edge.I had no idea what was being talked about otherwise.[0]: <a href="https://fly.io/docs/litefs/" rel="nofollow">https://fly.io/docs/litefs/</a>

22 comments

no_wizardover 2 years ago

评论 #32927622 未加载

评论 #32927575 未加载

评论 #32927635 未加载

lijogdfljkover 2 years ago

评论 #32927738 未加载

评论 #32929346 未加载

pphyschover 2 years ago

评论 #32928190 未加载

评论 #32928019 未加载

评论 #32929093 未加载

评论 #32928111 未加载

评论 #32929554 未加载

评论 #32929745 未加载

评论 #32929574 未加载

vcryanover 2 years ago

评论 #32929639 未加载

infogulchover 2 years ago

评论 #32929946 未加载

评论 #32929923 未加载

asimover 2 years ago

评论 #32928891 未加载

mwcampbellover 2 years ago

评论 #32929110 未加载

评论 #32940948 未加载

Existenceblinksover 2 years ago

nicoburnsover 2 years ago

评论 #32928379 未加载

hinkleyover 2 years ago

评论 #32933064 未加载

clordover 2 years ago

评论 #32929323 未加载

azlyricsover 2 years ago

vcryanover 2 years ago

hobo_markover 2 years ago

评论 #32928117 未加载

jensneuseover 2 years ago

评论 #32931205 未加载

endisneighover 2 years ago

评论 #32930312 未加载

评论 #32941026 未加载

whoisjuanover 2 years ago

Unrelated, but why does the map on their homepage show a region in Cuba? That must be wrong.

评论 #32928218 未加载

ranger_dangerover 2 years ago

ELI5?

评论 #32928030 未加载

评论 #32928069 未加载

theomegaover 2 years ago

评论 #32928148 未加载

评论 #32929982 未加载

评论 #32928083 未加载

morenoh149over 2 years ago

still wondering why projects like <a href="https://www.reactivated.io/" rel="nofollow">https://www.reactivated.io/</a> are using fly.io

presentationover 2 years ago

My dream would be if this supported geo-partitioning. In my field people are pretty sensitive about GDPR so would love to box in EU PII in EU servers.

评论 #32941068 未加载

fnyover 2 years ago