Congratulations to the Spanner team for becoming part of the Google public cloud!<p>And for those wondering, this is why Oracle wants billions of dollars from Google for "Java Copyright Infringement" because the only growth market for Oracle right now is their hosted database service, and whoops Google has a better one now.<p>It will be interesting if Amazon and Microsoft choose to compete with Google on this service. If we get to the point where you have databases, compute, storage, and connectivity services from those three at equal scale, well that would be a lot of choice for the developers!
Really a CP system but with the Availability being five 9s or better (less than one failure in 10^6)<p>How:
1)Hardware - Gobs and Gobs of Hardware and SRE experience<p>"Spanner is not running over the public Internet — in fact, every Spanner packet flows only over Google-controlled routers and links (excluding any edge links to remote clients). Furthermore, each data center typically has at least three independent fibers connecting it to the private global network, thus ensuring path diversity for every pair of data centers. Similarly, there is redundancy of equipment and paths within a datacenter. Thus normally catastrophic events, such as cut fiber lines, do not lead to partitions or to outages."<p>2) Ninja 2PC<p>"Spanner uses two-phase commit (2PC) and strict two-phase locking to ensure isolation and strong consistency. 2PC has been called the “anti-availability” protocol [Hel16] because all members must be up for it to work. Spanner mitigates this by having each member be a Paxos group, thus ensuring each 2PC “member” is highly available even if some of its Paxos participants are down."
The team here at Quizlet did a lot of performance testing on Spanner with one of our MySQL workloads to see if it's an option for us. Here are the test results: <a href="https://quizlet.com/blog/quizlet-cloud-spanner" rel="nofollow">https://quizlet.com/blog/quizlet-cloud-spanner</a>
This release shows the different philosophies of Google vs Amazon in an interesting way.<p>Google prefers building advanced systems that let you do things "the old way" but making them horizontally scalable.<p>Amazon prefers to acknowledge that network partitions exist and try to get you to do things "the new way" that deals with that failure case in the software instead of trying to hide it.<p>I'm not saying either system is better than the other, but doing it Google's way is certainly easier for Enterprises that want to make the move, and why Amazon is starting to break with tradition and release products that let you do things "the old way" while hiding the details in an abstraction.<p>I've always said that Google is technically better than AWS, but no one will ever know because they don't have a strong sales team to go and show people.<p>This release only solidifies that point.
Some interesting stuff in <a href="https://cloud.google.com/spanner/docs/whitepapers/SpannerAndCap.pdf" rel="nofollow">https://cloud.google.com/spanner/docs/whitepapers/SpannerAnd...</a> about the social aspects of high availability.<p>1. Defining high availability in terms of how a system is used: "In turn, the real litmus test is whether or not users (that want their own service to be highly available) write the code to handle outage exceptions: if they haven’t written that code, then they are assuming high availability. Based on a large number of internal users of Spanner, we know that they assume Spanner is
highly available."<p>2. Ensuring that people don't become too dependent on high availability: "Starting in 2009, due to “excess” availability, Chubby’s Site Reliability Engineers (SREs) started forcing periodic outages to ensure we continue to understand dependencies and the impact of Chubby failures."<p>I think 2 is really interesting. Netflix has Chaos Monkey to help address this (<a href="https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey" rel="nofollow">https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey</a>). There's also a book called Foolproof (<a href="https://www.theguardian.com/books/2015/oct/12/foolproof-greg-ip-review-biggest-risk-is-safety" rel="nofollow">https://www.theguardian.com/books/2015/oct/12/foolproof-greg...</a>) which talks about how perceived safety can lead to bigger disasters in lots of different areas: finance, driving, natural disasters, etc.
I wonder how this will affect adoption of CockroachDB [1], which was inspired by Spanner and supposedly an open source equivalent. I'd imagine that Spanner is a rather compelling choice, since they don't have to host it themselves. As far as I know, CockroachDB currently does not support providing CockroachDB as a service (but it is on their roadmap) [2].<p>[1] <a href="https://www.cockroachlabs.com/docs/frequently-asked-questions.html" rel="nofollow">https://www.cockroachlabs.com/docs/frequently-asked-question...</a><p>[2] <a href="https://www.cockroachlabs.com/docs/frequently-asked-questions.html#does-cockroach-labs-offer-a-cloud-database-as-a-service" rel="nofollow">https://www.cockroachlabs.com/docs/frequently-asked-question...</a>
For those trying to compare this with AWS Aurora, Aurora is more a regular database (MySQL / Postgres) engine with a custom data storage plugin that's AWS/ELB/SSD/EFS-aware. Because of this the database engine can make AWS specific decisions and optimizations that greatly boost performance. It supports master-master replication in the same region, master-slave across regions.<p>Global Spanner looks like a different beast, though. It looks like Google has configured a database for master-master(-master?) replication, across regions and even continents. They seem to be pulling it off by running only their own fiber, each master being a paxos cluster itself, GPS, atomic clocks and lot of other whiz-bangery.
The white paper is available here: <a href="http://static.googleusercontent.com/media/research.google.com/en/us/archive/spanner-osdi2012.pdf" rel="nofollow">http://static.googleusercontent.com/media/research.google.co...</a><p>for anyone interested
I wonder why they charge a minimum of $0.90 per node-hour when they offer VMs for as little as $0.008/hr. This is hugely useful even for single-person startups, so why charge a minimum of ~$8,000 per year?
Amazon likes to respond to Google with it's own price drops and product launches. It's telling that their announcements are orthogonal instead of direct competition with Spanner.<p>When Google announced Spanner back in 2012, I'm sure Amazon and Microsoft started teams to reproduce their own versions.<p>Spanner is not just software. The private network reduces partitions. GPS and atomic clocks for every machine help synchronize time globally. There won't be a Hadoop equivalent for Spanner, unless it includes the hardware spec.
Thomas Watson in 1943 amd his famous quote: “I think there is a world market for about five computers".<p>If he was alive, he could say these computers are Google, Apple, Microsoft, Amazon and Facebook.
How does this compare to AWS Aurora in terms of pricing and performance?<p>With Aurora the basic instance is $48/month and they recommend at least two in separate zones for availability, so it's about $96/month minimum. Storage is $.10/GB and IO is $.20 per million requests. Data transfer starts at $.09/GB and the first GB is free.[1]<p>Spanner is a minimum of $650/mo (6X the Aurora minimum), storage is $.30/GB (3X), and data transfer starts at $.12/GB (1.3X).<p>Of course with Aurora you have to pick your instance size and bigger faster instances will cost more. Also there's the matter of multi-region replication, although it appears that aspect of Spanner is not priced out yet. So maybe as you scale the gap narrows, but it's interesting to price out the entry point for startups.<p>[1] <a href="https://aws.amazon.com/rds/aurora/" rel="nofollow">https://aws.amazon.com/rds/aurora/</a>
Forgive my ignorance, but could someone explain in layman's terms in which situation this would be helpful? E.g. if I have 1TB of data would I use this? If I have 1GB with a growth rate of 25GB/daily would I use this?
> Today, we’re excited to announce the public beta for Cloud Spanner, a globally distributed relational database service that lets customers have their cake and eat it too: ACID transactions and SQL semantics, without giving up horizontal scaling and high availability.<p>This sounds too good to be true. But it's Google, so maybe not. Time to start reading whitepapers...
Link to the actual OSDI paper (not the simpler whitepaper) <a href="https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf" rel="nofollow">https://static.googleusercontent.com/media/research.google.c...</a>
Looks cool, but the pricing seems a bit non-cloud-native (or at least non-GCP-native).<p>"You are charged each hour for the maximum number of nodes that exist during that hour."<p>We've been educated by Google to consider per-minute, per-instance/node billing normal - and presumably all the arguments about why this is the right, pro-customer way to price GCE apply equally to Cloud Spanner.
While everyone is puzzling over how Spanner seems to be claiming to be CA, I would like to take this opportunity to bring up PACELC[1].<p>The idea is that the A-or-C choice in CAP only applies during network partitions, so it's not sufficient to describe a distributed system as either CP or AP. When the network is fine, the choice is between low latency and consistency.<p>In the case of Spanner, it chooses consistency over availability during network partitions, and consistency over low latency in the absence of partitions.<p>1: <a href="http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf" rel="nofollow">http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf</a>
> clients can do globally consistent reads across the entire database without locking<p>How is this possible across data centres? Does it send data everywhere at once?<p>Seems too good to be true of course but if it works and scales it might be worthwhile just not having to worry about your database scaling? Still I don't believe it ;-)<p>EDIT: further info...<p>> Spanner mitigates this by having each member be a Paxos group, thus ensuring each 2PC “member” is highly available even if some of its Paxos participants are down. Data is divided into groups that form the basic unit of placement and replication.<p>So it's SQL with Paxos that presumably never get's confused but during a partition will presumably not be consistent.
I was like YES!!! Then I read for a single node it is 90 cents per hours then I was like NO!!! so absolute minimal cost for me is $648/month? I was hoping there was like a dev version. Maybe I didn't read the fine print?
One thing to note is Spanner's transactions are different compared to what you get with a traditional RDBMS. See <a href="https://cloud.google.com/spanner/docs/transactions#ro_transaction_example" rel="nofollow">https://cloud.google.com/spanner/docs/transactions#ro_transa...</a><p>An example is the rows you get back from a query like "select * from T where x=a" can't be part of a RW transaction. I believe because they don't have the time-stamp associated with them. So, you have to re-read those rows via primary key inside a RW transaction to update them. This can be a surprise if you are coming from a traditional RDBMS background. If you are think about porting your app from MySQL/PostgreSQL to Spanner, it will be more than just updating query syntax.<p>Disclaimer: I used F1 (built on top of Spanner, <a href="https://research.google.com/pubs/pub41344.html" rel="nofollow">https://research.google.com/pubs/pub41344.html</a>) few years ago.
>> Remarkably, Cloud Spanner achieves this combination of features without violating the CAP Theorem.<p>This is the best weasel PR language I have seen in a long time.<p>Note that the sentence does not actually proclaim that they solved (the previously "unsolvable") problem of achieving distributed consensus with unreliable communication while maintaining partition tolerance and availability.<p>The blog only says they don't "violate" the CAP theorem -- whatever that means. So the statement is technically correct. Still the intention is obviously to mislead the casual reader (why else would you start the sentence with "Remarkably"?).<p>A litmus test: The same statement is true for MySQL - or _any other_ database in fact:<p><pre><code> >> "Remarkably, MySQL achieves this combination of features without violating the CAP theorem"
</code></pre>
It's a bit like saying<p><pre><code> >> "Remarkably, MySQL is not a perpetuum mobile"</code></pre>
Related, I wrote a blog post on the network latency between Google Compute Engine zones and regions. I'm assuming Cloud Spanner will still have these latencies once multi-region is deployed. Cross-zone latency on GCE is very good though.<p><a href="https://blog.elasticbyte.net/comparing-bandwidth-prices-and-network-latency-between-google-compute-zones-and-regions/" rel="nofollow">https://blog.elasticbyte.net/comparing-bandwidth-prices-and-...</a>
Oh this looks really compelling! Though I'm guessing this is targeted to companies? I'd love to use this for some personal projects but the pricing seems really high. Am I reading it right that a single node being used at least a tiny bit every hour is about $670 a month?<p>Maybe I'm misunderstanding how the pricing works here. Any clarification would be highly welcomed :)
> This leads to three kinds of systems: CA, CP and AP,<p>What is a distributed system that is CA? Can you build a distributed system which will never have a partition.
Few questions from reading the docs:<p>1) How big can all the colocated data for a single primary key get before they don't fit within a split? Can I implement a GMail-like product where all the data for a single user resides within one split?<p>2) Is there a way to turn off external consistency and fall back to serializability? In return you get better write latencies. This is similar to what CockroachDB provides?
Here is a very interesting video from 2013 of Martin Schoenert explaining the Google Spanner White Paper (In german though): <a href="https://www.youtube.com/watch?v=2QKewyoOSL0" rel="nofollow">https://www.youtube.com/watch?v=2QKewyoOSL0</a><p>Now he works for Google as an Engineering Manager.
Doesn't seem possible to use this yet. No client libraries and no samples: <a href="https://cloud.google.com/spanner/docs/tutorials" rel="nofollow">https://cloud.google.com/spanner/docs/tutorials</a><p>Have they documented the wire protocol? I couldn't find it.
> <i>If you have a MySQL or PostgreSQL system that's bursting at the seams</i><p>Postgresql ? How does this work for people <i>migrating</i> from traditional SQL databases - typically people use ORM. How would this fit in with, say , Rails or SqlAlchemy ?
Very interesting. How does this pricing compare to AWS Aurora? <a href="https://aws.amazon.com/rds/aurora/pricing/" rel="nofollow">https://aws.amazon.com/rds/aurora/pricing/</a>
So does Cloud Spanner replace the existing Google Cloud SQL offering [1]? What are the pros/cons of each?<p>[1] <a href="https://cloud.google.com/sql/" rel="nofollow">https://cloud.google.com/sql/</a>
Interesting but without INSERT and UPDATE it just isn't worth it for me. When can we expect it to handle data manipulation language (DML) statements?
"What if you could have a fully managed database service that's consistent, scales horizontally across data centers and speaks SQL?"<p>Looks like Google forgot to mention one central requirement: latency.<p>This is a hosted version of Spanner and F1. Since both systems are published, we know a lot about their trade-offs:<p>Spanner (see OSDI'12 and TODS'13 papers) evolved from the observation that Megastore guarantees - though useful - come at performance penalty that is prohibitive for some applications. Spanner is a multi-version database system that unlike Megastore (the system behind the Google Cloud Datastore) provides general-purpose transactions. The authors argue: We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions. Spanner automatically groups data into partitions (tablets) that are synchronously replicated across sites via Paxos and stored in Colossus, the successor of the Google File System (GFS). Transactions in Spanner are based on two-phase locking (2PL) and two-phase commits (2PC) executed over the leaders for each partition involved in the transaction. In order for transactions to be serialized according to their global commit times, Spanner introduces TrueTime, an API for high precision timestamps with uncertainty bounds based on atomic clocks and GPS. Each transaction is assigned a commit timestamp from TrueTime and using the uncertainty bounds, the leader can wait until the transaction is guaranteed to be visible at all sites before releasing locks. This also enables efficient read-only transactions that can read a consistent snapshot for a certain timestamp across all data centers without any locking.<p>F1 (see VLDB'13 paper) builds on Spanner to support SQL-based access for Google's advertising business. To this end, F1 introduces a hierarchical schema based on Protobuf, a rich data encoding format similar to Avro and Thrift. To support both OLTP and OLAP queries, it uses Spanner's abstractions to provide consistent indexing. A lazy protocol for schema changes allows non-blocking schema evolution. Besides pessimistic Spanner transactions, F1 supports optimistic transactions. Each row bears a version timestamp that used at commit time to perform a short-lived pessimistic transaction to validate a transaction's read set. Optimistic transactions in F1 suffer from the abort rate problem of optimistic concurrency control, as the read phase is latency-bound and the commit requires slow, distributed Spanner transactions, increasing the vulnerability window for potential conflicts.<p>While Spanner and F1 are highly influential system designs, they do come at a cost Google does not tell in its marketing: high latency. Consistent geo-replication is expensive even for single operations. Both optimistic and pessimistic transactions even increase these latencies.<p>It will be very interesting to see first benchmarks. My guess is that operation latencies will be in the order of 80-120ms and therefore much slower than what can be achieved on database clusters distributed only over local replicas.
> Unlike most wide-area networks, and especially the public internet, Google controls the entire network and thus can ensure redundancy of hardware and paths, and can also control upgrades and operations in general<p>I know this is a single system, but I'll still say it. This seems like another step in a scary trend for our internet.
> Today, we’re excited to announce the public beta for Cloud Spanner, a globally distributed relational database service that lets customers have their cake and eat it too: ACID transactions and SQL semantics, without giving up horizontal scaling and high availability.<p>This is a bold claim. What do they know about the CAP theorem that I don't?<p>Separately, (emphasis mine):<p>> If you have a MySQL <i>or PostgreSQL</i> system that's bursting at the seams, or are struggling with hand-rolled transactions on top of an eventually-consistent database, Cloud Spanner could be the solution you're looking for. Visit the Cloud Spanner page to learn more and get started building applications on our next-generation database service.<p>From the rest of the article it seems like the wire protocol for accessing it is MySQL. I wonder if they mean to add a PostgreSQL compatibility layer at some point.
While the product is compelling (acid compliant, horizontally scanning DB), it does seem expensive.<p>If you use 2 nodes/hour,
Cost = (2<i>0.9) </i> 24 * 31 = $1400/month not anointing for storage and network chargers.
I see there's "data layer encryption" but the data is still readable by Google. Why would anyone want to keep feeding the Google beast with more data?<p>Software is about separating concerns, and decentralizing authority. Responsible engineers shouldn't be using this service.
Amazing! But why does this feel like such a de ja vue all over again.. (surely I'm missing something).. They've spent 5 years telling us that we just CAN'T scale SQL.. Now they'll tell us that actually.. they've figured it out! :)
Given the CAP theorem I wonder what trade-offs they make and how much visibility they give you into these trade-offs.<p>In any case this is much better than Amazon's offerings... when they actually ship it. :)
I wonder how many people will get a seizure from that red-blue blinking rectangle in the video :(<p>Upd: Downvoting this warning will only increase that number.
> Does this mean that Spanner is a CA system as defined by CAP? The short answer is “no” technically, but “yes” in effect and its users can and do assume CA.<p>It's somewhat ironic that Brewer, the original author of the CAP theorem, is making this sort of marketing-led bending of the CAP theorem terminology. I think what he really should be saying is something in more nuanced language like this: <a href="https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html" rel="nofollow">https://martin.kleppmann.com/2015/05/11/please-stop-calling-...</a><p>But perhaps Google's marketing department needed something in the more popular "CP or AP?" terminology. I don't see what would be wrong with "CP with extremely high availability" though.<p>It's certainly wacky to be claiming that a system is "CA", since as the post admits it's technically false; to me this makes it clear that CP vs. AP (vs. CA now?) does not convey enough information. I'd prefer "a linearizably-consistent data store, with ACID semantics, with a 99.999% uptime SLA". Not as snappy as "CA" (I will never have a career in marketing I suppose), but it makes the technical claims more clear.