An Unlikely Database Migration

364 pointsby ifcologneover 4 years ago

31 comments

judofyrover 4 years ago

Interesting choice of technology, but you didn't completely convince me to why this is better than just using SQLite or PostgreSQL with a lagging replica. (You could probably start with either one and easily migrate to the other one if needed.)In particular you've designed a very complicated system: Operationally you need an etcd cluster and a tailetc cluster. Code-wise you now have to maintain your own transaction-aware caching layer on top of etcd (<a href="https://github.com/tailscale/tailetc/blob/main/tailetc.go" rel="nofollow">https://github.com/tailscale/tailetc/blob/main/tailetc.go</a>). That's quite a brave task considering how many databases fail at Jepsen. Have you tried running Jepsen tests on tailetc yourself? You also mentioned a secondary index system which I assume is built on top of tailetc again? How does that interact with tailetc?Considering that high-availability was not a requirement and that the main problem with the previous solution was performance ("writes went from nearly a second (sometimes worse!) to milliseconds") it looks like a simple server with SQLite + some indexes could have gotten you quite far.We don't really get the full overview from a short blog post like this though so maybe it turns out to be a great solution for you. The code quality itself looks great and it seems that you have thought about all of the hard problems.

评论 #25768080 未加载

评论 #25768659 未加载

评论 #25770302 未加载

评论 #25771424 未加载

jeff-davisover 4 years ago

This post illustrates the difference between persistence and a database.If you are expecting to simply persist one instance of one application's state across different runs and failures, a database can be frustrating.But if you want to manage your data across different versions of an app, different apps accessing the same data, or concurrent access, then a database will save you a lot of headaches.The trick is knowing which one you want. Persistence is tempting, so a lot of people fool themselves into going that direction, and it can be pretty painful.I like to say that rollback is the killer feature of SQL. A single request fails (e.g. unique violation), and the overall app keeps going, handling other requests. You application code can be pretty bad, and you can still have a good service. That's why PHP was awesome despite being bad -- SQL made it good (except for all the security pitfalls of PHP, which the DB couldn't help with).

评论 #25772927 未加载

评论 #25768049 未加载

评论 #25773710 未加载

pettersover 4 years ago

> (Attempts to avoid this with ORMs usually replace an annoying amount of typing with an annoying amount of magic and loss of efficiency.)Loss of efficiency? Come on, you were using a file before! :-)Article makes me glad I'm using Django. Just set up a managed Postgres instance in AWS and be done with it. Sqlite for testing locally. Just works and very little engineering time spent on persistent storage.Note: I do realize Brad is a very, very good engineer.

评论 #25770438 未加载

评论 #25791022 未加载

0xbadcafebeeover 4 years ago

I am missing a lot of context from this post because this just sounds nonsensical.First they're conflating storage with transport. SQL databases are a storage and query system. They're intended to be slow, but efficient, like a bodybuilder. You don't ask a bodybuilder to run the 500m dash.Second, they had a 150MB dataset, and they moved to... a distributed decentralized key-value store? They went from the simplest thing imaginable to the most complicated thing imaginable. I guess SQL is just complex in a direct way, and etcd is complex in an indirect way. But the end results of both are drastically different. And doesn't etcd have a whole lot of functional limitations SQL databases don't? Not to mention its dependence on gRPC makes it a PITA to work with REST APIs. Consul has a much better general-purpose design, imo.And more of it doesn't make sense. Is this a backend component? Client side, server side? Why was it using JSON if resources mattered (you coulda saved like 20% of that 150MB with something less bloated). Why a single process? Why global locks? Like, I really don't understand the implementation at all. It seems like they threw away a common-sense solution to make a weird toy.

评论 #25772668 未加载

malisperover 4 years ago

The post touches upon it, but I didn't really understand the point. Why doesn't synchronous replication in Postgres work for this use case? With synchronous replication you have a primary and secondary. Your queries go to the primary and the secondary is guaranteed to be at least as up to date as the primary. That way if the primary goes down, you can query the secondary instead and not lose any data.

评论 #25768003 未加载

评论 #25767930 未加载

lrossiover 4 years ago

This reminds me of this post from the hostifi founder, sharing the code they used for the first 3 years:<a href="https://twitter.com/_rchase_/status/1334619345935355905" rel="nofollow">https://twitter.com/_rchase_/status/1334619345935355905</a>It’s just 2 files.Sometimes it’s better to focus on getting the product working, and handle tech debt later.

bob1029over 4 years ago

I do like putting .json files on disk when it makes sense, as this is a one-liner to serialize both ways in .NET/C#. But, once you hit that wall of wanting to select subsets of data because the total dataset got larger than your CPU cache (or some other step-wise NUMA constraint)... It's time for a little bit more structure. I would have just gone with SQLite to start. If I am not serializing a singleton out to disk, I reach for SQLite by default.

评论 #25773601 未加载

评论 #25767980 未加载

miki123211over 4 years ago

I use the same system (a JSON file protected with a mutex) for an internal tool I wrote, and it works great. For us, file size or request count is not a concern, it's serving a couple (internal) users per minute at peak loads, the JSON is about 150 kb after half a year, and old data could easily be deleted/archived if need be.This tool needs to insert data in the middle of (pretty short) lists, using a pretty complicated algorithm to calculate the position to insert at. If I had used an RDBMS, I'd probably have to implement fractional indexes, or at least change the IDs of all the entries following the newly inserted one, and that would be a lot of code to write. This way, I just copy part of the old slice, insert the new item, copy the other part (which are very easy operations in Go), and then write the whole thing out to JSON.I kept it simple, stupid, and I'm very happy I went with that decision. Sometimes you don't need a database after all.

评论 #25769543 未加载

评论 #25772800 未加载

gfodyover 4 years ago

> The file reached a peak size of 150MBis this a typo? 150MB is such a minuscule amount of data that you could do pretty much anything and be OK.

评论 #25768184 未加载

cbushkoover 4 years ago

I think a lot of people are missing the point that a traditional DB (MYSQL/Postgress) are not a good fit for this scenario. This isn't a CRUD application but is instead a distributed control plane with a lot of reads and a small dataset. Joins and complex queries are not needed in this case as the data is simple.I am also going to go out on a limb and guess that this is all running in kubernetes. Running etcd there is dead simple compared to even running something like Postgress.Congrats on a well engineered solution that you can easily test on a dev machine. Running a DB in a docker container isn't difficult but it is just one more dev environment nuance that needs to be maintained.

评论 #25770267 未加载

dekhnover 4 years ago

I never took a course in databases. At some point I was expected to store some data for a webserver, looked as the BSDDB API, and went straight to mysql (this was in ~2000). I spent the time to read the manual on how to do CRUD but didn't really look at indices or anything exotic. The webserver just wrote raw SQL queries against an ultra-simple schema, storing lunch orders. It's worked for a good 20 years and only needed minor data updates when the vendor changed and small python syntax changes to move to python3.At that point I thought "hmm, i guess I know databases" and a few years later, attempted to store some slightly larger, more complicated data in MySQL and query it. My query was basically "join every record in this table against itself, returning only rows that satisfy some filter". It ran incredibly slowly, but it turned out our lab secretary was actually an ex-IBM Database Engineer, and she said "did you try sorting the data first?" One call to strace showed that MySQL was doing a very inefficient full table scan for each row, but by inserting the data in sorted order, the query ran much faster. Uh, OK. I can't repeat the result, so I expect MySQL fixed it at some point. She showed me the sorts of DBs "real professionals" designed- it was a third order normal form menu ordering system for an early meal delivery website (wayyyyy ahead of its time. food.com). At that point I realized that there was obviously something I didn't know about databases, in particular that there was an entire schema theory on how to structure knowledge to take advantage of the features that databases have.My next real experience with databases came when I was hired to help run Google's MySQL databases. Google's Ads DB was implemented as a collection of mysql primaries with many local and remote replicas. It was a beast to run, required many trained engineers, and never used any truly clever techniques, since the database was sharded so nobody could really do any interesting joins.I gained a ton of appreciation for MySQL's capabilities from that experience but I can't say I really enjoy MySQL as a system. I like PostgresQL much better; it feels like a grownup database.What I can say is that after all this experience, and some recent work with ORMs, has led me to believe that while the SQL query model is very powerful, and RDBMS are very powerful, you basically have to fully buy into the mental model and retain some serious engineering talent- folks who understand database index disk structures, multithreading, etc, etc.For everybody else, a simple single-machine on-disk key-value store with no schema is probably the best thing you can do.

JacobiXover 4 years ago

After reading the comments and the blog post, I think that the requirements boils down to fast persistence to disk, minimum dependencies and fast test-runs. Fortunately the data is very small 150MB and it fits very easily in memory. According to the post the data changes often so they need to write the data many times in a second. But I'm not sure why do they need to flush every time the entire 150MB ? Why not structure the files/indexes such that we write only the modified data ?

shapiro92over 4 years ago

I find the article a bit hard to follow. What were the actual requirements? I probably didnt understand all of this, but was the time spent on thinking about this more valuable than using a KV Store?

评论 #25769320 未加载

256dpiover 4 years ago

I found myself in a similar situation sometime ago with MongoDB. In one project my unit tests started slowing me down too much to be productive. In another, I had so little data that running a server alongside it was a waste of resources. I invested a couple of weeks in developing a SQLite type of library[1] for Go that implemented the official Go drivers API with a small wrapper to select between the two. Up until now, it paid huge dividends in both projects ongoing simplicity and was totally worth the investment.[1]: <a href="https://github.com/256dpi/lungo" rel="nofollow">https://github.com/256dpi/lungo</a>

cavisneover 4 years ago

Honestly this feels like engineers that spent too long in FANG and get completely burnt out on dealing with SRE and HA requirements... so decide to built a setup so prone to 2AM pages even a PHP webshop would frown at it.Curiously though its a pattern I've seen twice in the last 12 months, there was that guide on the good bits of AWS that also recommended starting with a single host with everything running on it.Maybe we should all move that host back under our desks and really be back to basics!

评论 #25772914 未加载

评论 #25771318 未加载

评论 #25772164 未加载

dekhnover 4 years ago

"""Even with fast NVMe drives and splitting the database into two halves (important data vs. ephemeral data that we could lose on a tmpfs), things got slower and slower. We knew the day would come. The file reached a peak size of 150MB and we were writing it as quickly as the disk I/O would let us. Ain’t that just peachy?"""Uh, you compressed it first, right? because CPUs can compress data faster than disk I/O.

评论 #25768930 未加载

评论 #25768939 未加载

hankchinaskiover 4 years ago

seems like the reason for not going the MySQL/PSQL/DMBS route is lack of good Go libraries to handle relational databases (ORM/migration/testing)? from the story it seems more like a solution in search for a problem

评论 #25768050 未加载

评论 #25772402 未加载

bullenover 4 years ago

I also use JSON files... but one per value! It has it's ups and downs: pro it's incredibly fast and scales like a monster. con it's uses alot of space and inodes, so better use type small with ext4!The only feature it misses is to compress the data that is not actively in use, that way there is really not much of a downside.<a href="http://root.rupy.se" rel="nofollow">http://root.rupy.se</a>

manigandhamover 4 years ago

> "Attempts to avoid this with ORMs usually replace an annoying amount of typing with an annoying amount of magic and loss of efficiency."People seem to keep using poorly-designed ORMs or are stuck with some strange anti-ORM ideology.Modern ORMs are fast, efficient, and very productive. If you're working with relational databases then you're using an ORM. It's a question of whether you use something prebuilt or write it yourself (since those objects have to be mapped to the database somehow). 99% of the time, ORMs generate perfectly fine SQL (if not exactly what you'd type anyway) while handling connections, security, mapping, batching, transactions, and other issues inherent in database calls.The 1% of the time you need something more complex, you can always switch to manual SQL (and ORMs will even let you run that SQL while handling the rest as usual). The overall advantages massively outweigh any negatives, if they even apply to your project.

评论 #25771685 未加载

评论 #25771436 未加载

评论 #25773570 未加载

评论 #25772536 未加载

jamescunover 4 years ago

This post touches on "innovation tokens". While I agree with the premise of "choose boring technology", it feels like a process smell, particularly of a startup whose goal is to innovate a techology. Feels demotivating as an engineer if management says our team can only innovate an arbitrary N times.

评论 #25767366 未加载

评论 #25768259 未加载

评论 #25767964 未加载

评论 #25770428 未加载

5ersiover 4 years ago

Ahh, a classic case of "if you don't understand it, you are bound to reimplement it".

euskeover 4 years ago

My takeaway from the OP:> Never underestimate how long your “temporary” hack will stay in production!

评论 #25773612 未加载

ropableover 4 years ago

Tangentially, this article makes me so very glad that our own work projects are all making use of the Django ORM. Database migrations and general usage have been a non-issue for literally years.

ed25519FUUUover 4 years ago

> “Yeah, whenever something changes, we grab a lock in our single process and rewrite out the file!”Is this actually easier than using SQLite?

评论 #25768596 未加载

darren0over 4 years ago

Without knowing it, they reinvented the Kubernetes informer which I've proven can scale way past their current scale.

breckover 4 years ago

If you liked this I highly recommend "Clean Architecture" by Uncle Bob. He has a great story of how they kept putting off switching to a SQL DB on a project and then never needed to and it was a big success anyway.

评论 #25769420 未加载

评论 #25771054 未加载

francoispover 4 years ago

Looks like a clear case of migration to postgres. A single table with jsonb. You can do indexes on jsonb and a few plpgsql for partial updates, and forget about it for a while.

toddhover 4 years ago

So what happens when the computer fails in the middle of a long write to disk?

评论 #25767432 未加载

jbverschoorover 4 years ago

Doesn’t sound like smart thing to do and sounds more like a js dev/student discovering step by step why sql databases are so popular..Probably not so, bc tailscale is a decent product, but this post did not change my view in a good way

评论 #25768671 未加载

worikover 4 years ago

This jumped out at me: "The obvious next step to take was to move to SQL"No. Not unless your data is relational. This is a common problem, relational databases have a lot of over head. They are worth it when dealing with relational data. Not so much with non relational data.

评论 #25768974 未加载

nautilus12over 4 years ago

The premise of articles like this annoys me. It reeks of "we are too smart to use databases", "json is good enough for us", when anyone that works with data to any large extent knows that json is just a pain and we only have to deal with it because the front end is enamored with because "readable" and "javascript".

31 comments

judofyrover 4 years ago

评论 #25768080 未加载

评论 #25768659 未加载

评论 #25770302 未加载

评论 #25771424 未加载

jeff-davisover 4 years ago

评论 #25772927 未加载

评论 #25768049 未加载

评论 #25773710 未加载

pettersover 4 years ago

评论 #25770438 未加载

评论 #25791022 未加载

0xbadcafebeeover 4 years ago

评论 #25772668 未加载

malisperover 4 years ago

评论 #25768003 未加载

评论 #25767930 未加载

lrossiover 4 years ago

bob1029over 4 years ago

评论 #25773601 未加载

评论 #25767980 未加载

miki123211over 4 years ago

评论 #25769543 未加载

评论 #25772800 未加载

gfodyover 4 years ago

> The file reached a peak size of 150MBis this a typo? 150MB is such a minuscule amount of data that you could do pretty much anything and be OK.

评论 #25768184 未加载

cbushkoover 4 years ago

评论 #25770267 未加载

dekhnover 4 years ago

JacobiXover 4 years ago

shapiro92over 4 years ago

I find the article a bit hard to follow. What were the actual requirements? I probably didnt understand all of this, but was the time spent on thinking about this more valuable than using a KV Store?

评论 #25769320 未加载

256dpiover 4 years ago

cavisneover 4 years ago

评论 #25772914 未加载

评论 #25771318 未加载

评论 #25772164 未加载

dekhnover 4 years ago

评论 #25768930 未加载

评论 #25768939 未加载

hankchinaskiover 4 years ago

评论 #25768050 未加载

评论 #25772402 未加载

bullenover 4 years ago

manigandhamover 4 years ago

评论 #25771685 未加载

评论 #25771436 未加载

评论 #25773570 未加载

评论 #25772536 未加载

jamescunover 4 years ago

评论 #25767366 未加载

评论 #25768259 未加载

评论 #25767964 未加载

评论 #25770428 未加载

5ersiover 4 years ago

Ahh, a classic case of "if you don't understand it, you are bound to reimplement it".

euskeover 4 years ago

My takeaway from the OP:> Never underestimate how long your “temporary” hack will stay in production!

评论 #25773612 未加载

ropableover 4 years ago

Tangentially, this article makes me so very glad that our own work projects are all making use of the Django ORM. Database migrations and general usage have been a non-issue for literally years.

ed25519FUUUover 4 years ago

> “Yeah, whenever something changes, we grab a lock in our single process and rewrite out the file!”Is this actually easier than using SQLite?

评论 #25768596 未加载

darren0over 4 years ago

Without knowing it, they reinvented the Kubernetes informer which I've proven can scale way past their current scale.

breckover 4 years ago

评论 #25769420 未加载

评论 #25771054 未加载

francoispover 4 years ago

Looks like a clear case of migration to postgres. A single table with jsonb. You can do indexes on jsonb and a few plpgsql for partial updates, and forget about it for a while.

toddhover 4 years ago

So what happens when the computer fails in the middle of a long write to disk?

评论 #25767432 未加载

jbverschoorover 4 years ago

评论 #25768671 未加载

worikover 4 years ago

评论 #25768974 未加载

nautilus12over 4 years ago