Switch Your Databases To Flash Storage

187 点作者 jpmc超过 12 年前

16 条评论

Wear patterns and flash are an issue, although rotational drives fail too. There are several answers. When a flash drive fails, you can still read the data. A clustered database and multiple copies of the data, you gain reliability – a server level of RAID. As drives fail, you replace them.Unlike magnetic disks, SSDs have a tendency to fail at a really predictable rate. So predictably that if you've got two drives of the same model, put them into commission at the same time, and subject them to the same usage patterns, they will probably fail at about the same time. That's a real problem if you're using SSDs in a RAID array, since RAID's increased reliability relies on the assumption that it's very unlikely for two drives to fail at about the same time.With an SSD, though, once one drive goes there's a decent (perhaps small, but far from negligible) chance that a second drive will go out before you've had a chance to replace the first one. Which makes things complicated, but is much better than the similarly likely scenario that a second SSD fails shortly after you replace the first one. Because then it's possibly happening during the rebuild, and if that happens then it really will bring down the whole RAID array.That said, if you're careful then that predictability should be a good thing. A good SSD will keep track of wear for you. So all you've got to do is monitor the status of the drives, and replace them before they get too close to their rated lifespan. If you add that extra step you're probably actually improving your RAID's reliability. But if you treat your RAID as if SSDs are just fast HDDs, you're asking for trouble.

评论 #4901529 未加载

评论 #4901091 未加载

评论 #4901209 未加载

评论 #4902687 未加载

评论 #4902168 未加载

评论 #4901703 未加载

评论 #4902592 未加载

评论 #4901203 未加载

评论 #4901079 未加载

ghshephard超过 12 年前

I'm surprised that the author didn't capture what I consider to be the most important component of HDD/Flash/Memory Balancing - frequency of access.The rule of thumb that I've heard thrown about is, "If you touch it more than once a day, move to flash. If you touch it more than once an hour, move to memory."While we can debate where that actual line falls based on both the price and performance of the various media (And, as the price of flash drops, it may be more like, "once every couple days) - it's important to note that frequency of access is critical when determining which media to put your data on.We have some 50 TB+ Data Sets that are queried weekly for analytics, that don't make a heckuva lot of sense on flash storage. Contra-wise, our core device files are queried multiple times a second, and so we make certain those database servers always have enough memory to keep the dataset in memory cache, even if that means dropping 256 GB onto those database servers for larger customers.

评论 #4901556 未加载

pjungwir超过 12 年前

The PostgreSQL mailing list is having a conversation right now about using SSDs. This seems like a very important comment for anyone considering them:<pre><code> http://archives.postgresql.org/pgsql-general/2012-12/msg00202.php </code></pre> Basically, you need to make sure that you buy SSDs with a capacitor that allows the drive to flush what it needs in event of abrupt power loss.EDIT: Looks like the list archives didn't preserve the thread very well, so here is the original question for anyone interested:<pre><code> http://archives.postgresql.org/pgsql-general/2012-11/msg00427.php</code></pre>

评论 #4902864 未加载

评论 #4902657 未加载

bcoates超过 12 年前

I love my consumer SSD backed database, but don't get visions of 380,000 IOPS on a real workload quite yet. Like any radical performance increase on just one component it's more likely to just reveal a non-disk latency bottleneck somewhere else in your system.Be aware that the performance characteristics of flash are very unlike spinning disks, and vary widely between models. You will see things like weird stalls, wide latency variance, and write performance being all over the place during sustained operations and depending on disk fullness. I chose Intel 520s because they performed better on MySqlPerformanceBlog benchmarks than the then-current Samsung offering [1] and because of OCZ's awful rep [2]. I hit about 5K write IOPS spread across two SSDs before my load becomes CPU-bound, which is nowhere near benchmark numbers but pretty sweet for a sub-$1k disk investment.It's also my understanding that non-server flash drives like recommended by the article do not obey fsync and are suspect from a ACID standpoint. RAID mirroring does not fix this--if integrity across sudden power loss is critical you might not be able to use these at all and will have to find a more expensive server SSD.[1] <a href="http://www.mysqlperformanceblog.com/2012/04/25/testing-samsung-ssd-sata-256gb-830-not-all-ssd-created-equal/" rel="nofollow">http://www.mysqlperformanceblog.com/2012/04/25/testing-samsu...</a>[2] <a href="http://www.behardware.com/articles/881-7/components-returns-rates-7.html" rel="nofollow">http://www.behardware.com/articles/881-7/components-returns-...</a>p.s. the RAM benefits the article mentions are real and potentially huge. My query and insert performance has gone from having heavy RAM scalability issues to it hardly mattering at all. This is all on MariaDB on a non-virtualized server; I'm looking forward to better SSD-tuned databases in the future doing even better.

评论 #4900797 未加载

评论 #4901068 未加载

paulsutter超过 12 年前

My favorite quote:"Flash is 10x more expensive than rotational disk. However, you’ll make up the few thousand dollars you’re spending simply by saving the cost of the meetings to discuss the schema optimizations you’ll need to try to keep your database together."Lots of great technical details presented in a commonsense style, well worth a read.

评论 #4900981 未加载

jiggy2011超过 12 年前

"Switch Your Databases To Flash Storage. Now. Or You're Doing It Wrong."Unless you know, you're storing a lot of stuff and are quite happy with your current level of performance and don't want to shell out a load on new hardware that will fail quicker.

评论 #4900605 未加载

staunch超过 12 年前

Shameless plug alert. At Uptano[1], this is one of the neatest things we've seen with our very inexpensive SSD machines. It's amazing what you can do with 8GB RAM + 100 GB RAID1 SSD. It's probably the best price:performance DB you can run, and is sufficient for ~95% of projects.1. <a href="https://uptano.com" rel="nofollow">https://uptano.com</a>

评论 #4904448 未加载

buro9超过 12 年前

I would love if cloud providers offered SSD options for their full range of boxes.For example, to be able to get a Linode at only a fraction more of the cost (say, a 10% premium) with the disk being SSD (and obviously reduced capacity compared to HDD).I have seen the current offerings but found them to either be too costly (AWS, only one of the the largest instances), or too onerous (ssdnodes.com whose base products aren't aligned with the costs elsewhere, and to move all of your hosts to be near your SSD powered database is a big task when I only seek a little task).I was even considering co-locating as the most cost-effective way to get SSDs when providers still massively overprice them. It all feels a bit like the RAM scam a decade ago when they'd charge you near the cost of the RAM every 2 months. Again though... co-location fell into the onerous class of actions.Right now, pragmatically I stay with HDD and Linode.But Linode should look at my $500 per month account and be well aware that as soon as I see a competitor offer SSD nodes at a cost-competitive point that offsets the burden to move... I'll be gone.

评论 #4901124 未加载

评论 #4901042 未加载

评论 #4901767 未加载

评论 #4901115 未加载

评论 #4903244 未加载

knappador超过 12 年前

Those caught in the middle on DB size needs and performance would be well off to take a look at Bcache. <a href="http://bcache.evilpiepirate.org/" rel="nofollow">http://bcache.evilpiepirate.org/</a> It's a block write-back cache and seems to perform really nicely. Here's some benchmarks. <a href="http://www.accelcloud.com/2012/04/18/linux-flashcache-and-bcache-performance-testing/" rel="nofollow">http://www.accelcloud.com/2012/04/18/linux-flashcache-and-bc...</a>

cioc超过 12 年前

Does anyone else find the section "Don’t use someone else’s file system" a bit confusing? It starts off by convincingly saying O_DIRECT shouldn't be used and then goes on to say O_DIRECT works very well.

评论 #4901735 未加载

stephenpiment超过 12 年前

Clearly, there are different usage regimes where different solutions will make sense. Nonetheless, there's a really strong case to be made that SSDs have entered a sweet spot in terms of price/performance for databases, and this trend is only accelerating. Here's one discussion of the rationale: <a href="http://www.foundationdb.com/#SSDs" rel="nofollow">http://www.foundationdb.com/#SSDs</a>.

leif超过 12 年前

"Use large block writes and small block reads"Yep. Write amplification is a big deal on SSDs and gets worse due to their internal garbage collection, if you give them a high-entropy write pattern. This is not a problem though, with TokuDB. See our "advantage 3" here: <a href="http://www.tokutek.com/2012/09/three-ways-that-fractal-tree-indexes-improve-ssd-for-mysql/" rel="nofollow">http://www.tokutek.com/2012/09/three-ways-that-fractal-tree-...</a>

dutchbrit超过 12 年前

Funny, and it's a no brainer really.. There was a thread about SSD's about 2 years back, regarding good ways to use them. My conclusion was pretty much the same when it came to DB's, yet nobody agreed with me back then and I received 3 downvotes. Odd!!Good article!

trotsky超过 12 年前

I don't disagree with the conclusions, but don't you have to short stroke those ssds pretty significantly in a high transaction environment to avoid write amplification?It's too bad longevity worries are keeping them out of the no commitment market.