Show HN: FaunaDB, a strongly consistent, globally distributed cloud database

131 pointsby evanweaverabout 8 years ago

23 comments

gregwebsabout 8 years ago

I think Fauna is not very good at docs and communication yet, at least judging by confusion from some of the comments and by reading their docs. But launching will probably make them a lot better at it. Here are my notes which may add clarity for some:similar to a RethinkDB/MongoDB:* Designed to be great for storing application data. Fields can be dynamically added (schemaless) and their values can be arrays so it is easy to maintain data locality according to your application use patterns.* Uses a non-SQL query language* Probably not great for ad-hoc reporting (arguably SQL is a requirement for that)Unlike MongoDB: supports joinsUnlike RethinkDB: great support for transactions, just not SQL transactions with an open session (which are unnecessary for an application)Unlike most databases* cloud-hosted and pay-for-use (on-premise is on their roadmap)* claims support for graph data by storing arrays of references* QoS built-in so you could run a slow analytics query without disrupting your applicationCons* Unfortunately just like MongoDB/RethinkDB they have no real database-level integrity of schema and foreign keys, but at least foreign keys are on their roadmap.I am a huge fan of the cloud-hosted pay for use aspect: I wonder why anyone would design a DB today without this in mind. You can transfer your data from a pay-for-use application DB (FaunaDB or Google DataStore) to a data warehouse (Snowflake or Google BigQuery) which is also pay for use and gives you SQL reporting abilities.

评论 #13880772 未加载

ccleveabout 8 years ago

You might consider removing the badge on the home page that says "Global Latency 2.8 ms". Unless you really can give me latency across the globe of 2.8 ms, in which case your solution to the speed-of-light problem is quite impressive :)

评论 #13882188 未加载

评论 #13881522 未加载

评论 #13879999 未加载

akerl_about 8 years ago

It seems like one of the big issues with the marketing copy here is some of the word tricks being played:Any first-read of "The first serverless database" has the implication that the database is serverless. Comments from FaunaDB folks in this comment page clearly indicate that they mean is that it's the first database for serverless, which is a pretty bold claim, given Google and AWS and any number of other providers offer databases that are accessible from serverless things, so it essentially boils down to "The first database that's marketed specifically to serverless use cases", which is maybe true but also kindof not a useful trophy to put on the mantle?This is further muddled by the blog post linked to from the launch announcement (<a href="https://fauna.com/blog/escape-the-cloud-database-trap-with-serverless" rel="nofollow">https://fauna.com/blog/escape-the-cloud-database-trap-with-s...</a>), which includes "FaunaDB Serverless Cloud is an adaptive, serverless database". Nobody is reading that and thinking "ah, an adaptive database for serverless apps".To describe it as "The first active-active multi-cloud database", is possibly true if you mean "the first time a single company has sold a publicly-available database-as-a-service running on multiple cloud providers". But the text says "database" where "public database-as-a-service" would be the accurate term, leaving the reader with the impression that no existing databases can be set up on multiple cloud providers in an active-active HA config, which is absurd. Fixing the copy here should be pretty easy, and they're already headed in the right direction with the next bullet point, although it as well refers to "database" where it means "database-as-a-service".It feels like somebody on marketing really wanted to have a list of firsts, so they toyed with definitions of words until they thought they could flex these into being technically accurate. I get the same feel from the closing argument in the linked blog post: "The query language, data model (including graphs and change feeds), security features, strong consistency, scalability and performance are best in class. There is no downside.". I don't think I want to trust a database if the folks designing it couldn't think of any downsides.

评论 #13881275 未加载

评论 #13881450 未加载

evanweaverabout 8 years ago

Hey everybody, today we launched FaunaDB Serverless Cloud, 4 years in the making. FaunaDB is a strongly consistent, globally distributed operational database. It’s relational, but not SQL.We're excited to open our doors and explain more of our design decisions. Our team is from Twitter, and that experience has deeply informed our interface and architecture. Try it out and let us know what you think.An on-premises release is coming later this year.

评论 #13879804 未加载

评论 #13879759 未加载

评论 #13880449 未加载

wsh91about 8 years ago

(Disclosure: I work on Google's Cloud Datastore.)This looks super neat, and I can't wait to learn more about it, but just for the record: I'm pretty sure this isn't the first serverless cloud database. Both Firebase's Realtime Database and Cloud Datastore (which powers Snapchat and Pokemon Go) are serverless; you pay only for your ops and storage. They've been publicly available for several years.

评论 #13879976 未加载

评论 #13880043 未加载

mdasenabout 8 years ago

I feel like I need to note the pricing. $0.01 per 1,000 queries. That doesn't sound like much, but it adds up. Let's say you make 1,000/sec. $0.01 * 60 seconds in a minute * 60 minutes in an hour * 24 hours in a day * 30 days in a month = $25,920.Is that a lot? I think it is. Google Cloud Spanner costs $0.90/hour per node or around $650/mo. Each Cloud Spanner node can do around 10,000 queries per second[1]. So, $650 to Google gets you 10x the queries that $25,920 to Fauna gets you. I mean, for $25,920, you could get a Spanner cluster with 40 servers. Each of those servers would only have to handle 25 queries per second to get you 1,000 queries per second.I'm sure that people are going to question whether FaunaDB can actually do what it claims. At this pricing, I can't imagine someone actually seeing if they can live up to their claims. They have a graph showing linear scaling to 2M reads per second. Based on their pricing, that would be $630M per year. For comparison, Snapchat committed to spending $400M per year on Google Cloud and another $100M on AWS (and people thought the spend was outrageous even for a company valued at tens of billions of dollars). This is more money for the database alone.Heck, it looks like one can get 5-20k queries per second out of Google's Cloud SQL MySQL on a highmem-16 costing $1k/mo[2]. That would cost $130k-$500k on FaunaDB. It seems like the pricing of FaunaDB is off by a couple orders of magnitude.Ultimately, Spanner is something built by people that published a notable research paper and used by Google. Reading the paper, you can understand how Spanner works and be saddened that you don't have TrueTime servers powered by GPS and atomic clocks. FaunaDB has some marketing speak about how I'll never have to worry about things ever again - without telling me how it will achieve that.It's also implemented in Scala. This isn't a dig on Scala or the JVM, but I use three datastores on the JVM and only one isn't sad for it is Kafka. But Kafka does very little in the JVM - it basically just leans on sendfile to handle stuff which means you don't get bad GC cycles or lots of allocations and copying.FaunaDB is a datastore without much information other than "it's great for everything and scales perfectly". Well, at their pricing, they might be able to make it happen. I mean, most customers would simply move to something cheaper as they got beyond small amounts of traffic due to the pricing. 60,000 queries per second? That'll be $18M per year from FaunaDB or $50k per year from Google. It's not even in the same ballpark. If you really need to scale to 2M reads per second, $630M seems like a lot more than $1.6M for Spanner.Maybe it's an easy way to get some money off people that "need a web scale database", but are actually going to be serving like 10 queries per second and are willing to spend $260/mo to serve that. If they hit it big, it shouldn't be insane to scale it to 10,000 queries per second and milk $260k out of them each month for a workload that can be handled by a single machine. That money also pays for decent ops people to run a big box and consult with the customer if they're going towards 100k queries per second with a $2.6M monthly payment.EDIT: looking over Fauna's blog and some of their comments here, they seem to understand more than their marketing lets on. Daniel Abadi is one of those people whose name carries weight in the databases world (having been involved with C-Store/Vertica, H-Store/VoltDB, and others). While I haven't read the Calvin paper, it looks like a good read. I can see that they are using logical clocks and I can't find it right now, but I thought I saw that they're not allowing one to keep transaction sessions checked out - that all the operations must be specified. So, it seems like there's some decent stuff in there that's currently being obscured by marketing-speak. Still, the pricing seems really curious.[1] <a href="https://cloud.google.com/spanner/docs/instance-configuration" rel="nofollow">https://cloud.google.com/spanner/docs/instance-configuration</a>[2] <a href="https://www.pythian.com/blog/benchmarking-google-cloud-sql-instances/" rel="nofollow">https://www.pythian.com/blog/benchmarking-google-cloud-sql-i...</a>

评论 #13882564 未加载

评论 #13882456 未加载

Efrim-Lipkinabout 8 years ago

Hi guys. Can I ask a simple question?I understand that we are talking about a globally distributed, serverless and yet consistent relational database.My question is about latency. How long does it take for transactional atomicity to become a consistent read on a globally distributed database? (1) And what are the measures taken between entry nodes to prevent clients from recieving inconsistent data? (2)As I ponder this, I am struck by not the consistency problem, as that is solvable. But I am struck by the latency problem of assuring that all global queries are consistent for some (any) time quanta. What sort of latency should be expected?both questions (1) and (2) are interesting, but (1) is critical while (2) is academic.Thanks, and very interesting work guys.EL

评论 #13880744 未加载

评论 #13880363 未加载

danthemanvsqzabout 8 years ago

I think they missed Google's launch of Spanner their distributed strongly consistent DB.

评论 #13882243 未加载

jazoomabout 8 years ago

$0.01 per simple operation sounds very expensive to me. This would add up very quickly.Edit: I misread it. Perhaps instead of inventing your own point system that you have to explain and hope silly people (like me) don't mix up you could take a lesson from Google Cloud and just lay out the pricing in a table. If you ever add another service you'll have to integrate it also into your made up points system.

评论 #13879909 未加载

doublerebelabout 8 years ago

That pricing model and serverless model is why I've always chosen CouchDB/Cloudant. If I'm doing the MB/hour to GB/month conversion correctly, Fauna cloud is significantly cheaper.I see Fauna has temporal queries, but receiving events is strictly pull, there is no push or single feed?

评论 #13880732 未加载

jchrisaabout 8 years ago

There is a related technical blog post [1] and discussion [2]. Also I've got a companion blog post on the Serverless.com blog at [3][1] <a href="https://fauna.com/blog/escape-the-cloud-database-trap-with-serverless" rel="nofollow">https://fauna.com/blog/escape-the-cloud-database-trap-with-s...</a>[2] <a href="https://news.ycombinator.com/item?id=13877223" rel="nofollow">https://news.ycombinator.com/item?id=13877223</a>[3] <a href="https://serverless.com/blog/faunadb-serverless-authentication/" rel="nofollow">https://serverless.com/blog/faunadb-serverless-authenticatio...</a>

评论 #13879656 未加载

snackaiabout 8 years ago

Serverless Database, Global Latency 2.8 ms, Relational but no SQL (whatever sense that makes) BULLSHIT BINGO at its very best.

评论 #13880399 未加载

z3t4about 8 years ago

You should explain how it works. It's not like I'm going to steal your ideas and spend five years implementing them ... or maybe I will if it's good ;)

评论 #13880349 未加载

anamoulousabout 8 years ago

I have been a fan of Evan going way back to the early Rails days. Congrats on the launch.

elvinyungabout 8 years ago

I'm curious about the relational-ness of FaunaDB. e.g. How do you efficiently maintain integrity of foreign key constraints across the entire system? How fast and consistent are secondary indexes?

sushisourceabout 8 years ago

So... where does the data go? Maybe a simpleton question but I couldn't easily find an answer in the about section. If it's all function-based, where does the data actually get persisted?

评论 #13879739 未加载

mring33621about 8 years ago

If my calculations are correct, that's about $87+ million USD to store 1 PB of data for one year?

评论 #13880801 未加载

评论 #13880253 未加载

dslabout 8 years ago

How are you running this serverless? Is it a thin application in front of AWS or Google BigQuery?

评论 #13880420 未加载

jimreediaabout 8 years ago

Cool. My team has been looking for something like this.

评论 #13879863 未加载

pbgieseabout 8 years ago

This sounds a lot like google spanner. I'm no expert though. What's the difference?

评论 #13881018 未加载

nunziabout 8 years ago

can't wait to learn more

评论 #13879614 未加载

hubert123about 8 years ago

Do you support a hard limit on money spent? I would like to be able to say 30 bucks a month max or something

marknadalabout 8 years ago

If your database is not Open Source then your marketing lingo needs to be more open or else you'll have the same mistake as FoundationDB (which looked like vapor-ware).As a proprietary service, you are now competing against Cloud Spanner which (while people love the underdog) means your toast because they have Eric Brewer to hand wave away their marketing lingo.On the flip side, you are competing against Cockroach, but they are Open Source so that puts you in a rock and a hard place. From previous comments of mine, you may know I don't think Cockroach has much of a future either because Globally Consistent databases aren't going to cater to the necessary P2P future of the web (5B+ new people coming online, 100B+ IoT devices, graph enabled social web, Machine Learning, etc. which is what we, <a href="http://gun.js.org/" rel="nofollow">http://gun.js.org/</a> , caters to and we just successfully ran load tests on low end hardware doing 1.7K table inserts/sec across a federated system, we plan on getting this up to 10K inserts/second on cheap [if not free] hardware).Why are these systems going to fail to pick up the market? Because the best of the best, both in engineering and as an Open Source community, RethinkDB (which I praise highly) couldn't. At the end of the day, the few companies that need globally consistent transactions will trust (for better or for worse) Cloud Spanner, and the others who want to roll their own infrastructure will try Cockroach but ultimately switch to RethinkDB in the end.So on that note, as others have noted, don't use your /fantastic/ marketing opportunities (top of HN) to make false claims about being "industry first", it won't help you gather a developer community. Use this time to win developers over like Firebase did (which itself now has their community scared of when/if Google will shut them down, those developers are now flooding to RethinkDB and ours, despite Firebase being one of the best - high praise for them as well, like Rethink).

评论 #13880619 未加载

评论 #13880648 未加载