TechEcho

12 comments

noelwelshover 11 years ago

Talk about an inaccurate title! The improvement is a combination of off-heap storage and sharing storage amongst processes. I'm surprised they didn't look at Redis for this problem.These tricks have been used for a while in the JVM world. Here's a JVM equivalent of Hammerspace: <a href="http://www.mapdb.org/" rel="nofollow">http://www.mapdb.org/</a> And here's some slides concerning off-heap optimisations in Cassandra: <a href="http://www.slideshare.net/jbellis/dealing-with-jvm-limitations-in-apache-cassandra-fosdem-2012" rel="nofollow">http://www.slideshare.net/jbellis/dealing-with-jvm-limitatio...</a>On the JVM GC time is usually only an issue when the heap gets over 2GB or so. MRI's GC is not in the same league as the JVM's, but even so, 80MB should be easily handled. As such I'm guessing the memory consumption of multiple processes is causing the main issue, which would be solved if Ruby had real threads. JRuby has real threads, and many other language runtimes do as well. It seems like a lot of engineering effort is going into working around the deficiencies of MRI, a problem that can be easily solved by switching to something better.

评论 #7020563 未加载

joevandykover 11 years ago

I wonder if they would need this if they used a single ruby process with many threads (instead of many ruby processes).Their problems are mainly a result of needing to access 80 megabytes of slowly changing translation data. Since they run many ruby processes and have memory growth issues, this translation data was taking a while to load.If they had a single stable ruby process running on each box, possibly they wouldn't have had these issues.

评论 #7020505 未加载

jblowover 11 years ago

These guys never heard of shared memory, apparently?Does Ruby not provide a facility to use shared memory? I guess you don't get it by default in a GC'd language because the GC thinks it owns the world.

评论 #7020601 未加载

评论 #7020588 未加载

babs474over 11 years ago

Man there is a lot of negative snark at the top of this thread.I'm not sure if this system is a good idea or not but I wish some commenters would spend more time comparing their proposed solutions (shared mem, local db, memmap...) to Hammerspace rather than contentless dismissal.

评论 #7021796 未加载

georgemcbayover 11 years ago

Original HN thread topic ("How Airbnb Improved Response Time by 17% By Moving Objects From Memory To Disk") is misleading compared to actual article contents, but speaking to the topic subject rather than the article, I do find it pretty common for many developers to blanket assume that memory-based caching is always the way to go, because, well, memory is fast and disks are slow.This sort of thinking ignores the fact that filesystems already have their own (often very well-tuned) caching systems and in some cases (eg. sendfile(2) in Linux) the kernel can do zero-copy writes from files to the network that (along with decent fs caching) will easily outperform app-level memory-caching. Of course, this only applies for data that will remain relatively static, but often your best option is to mostly get out of the way and let the OS do the heavy lifting unless you've measured actual loads and are sure your solution is better.

joshwaover 11 years ago

Armchair quarterbacking:* Dedicate ruby processes to a particular subset of locales* Parallelize your memcache queries* Break up locale files into MRU/LRU strings to reduce size* Denormalize locales (in memory, cache, whatever) into single values for most common pages. (use with MRU/LRU above)As an aside, still don't understand how process->kernelspace driver->platter is faster than process->kernelspace socket->process->RAM? Especially for random access patterns. I suspect a memcache misconfiguration?

评论 #7020591 未加载

评论 #7020930 未加载

toddhover 11 years ago

You can dynamically load/unload shared libraries so the data is only shared once between all processes. A win is you can also optimize the memory layout of the translation tables (can be in C), for which a hash is probably not optimal. This can all be automated in the build process using the database as a source. During software upgrades processes must be aware enough to know when to reload. And since all shared memory schemes use virtual memory you still have potential latency issues because of paging. Not sure if a .so can be pinned. Another win is it is read only so you don't have to worry about corruption.

评论 #7021201 未加载

jcampbell1over 11 years ago

Sounds like they re-invented .mo files from gettext.<a href="https://www.gnu.org/software/gettext/manual/html_node/MO-Files.html" rel="nofollow">https://www.gnu.org/software/gettext/manual/html_node/MO-Fil...</a>

northisupover 11 years ago

The article does not address why this outsourced heap is better than other outsourced heaps.

ashayhover 11 years ago

I don't get it.Did they actually benchmark all possible options like shared memory or Sqlite or mysql memory engine (periodically backed)?They say memcache (or redis) would have been slower because of network latency even over localhost. But did they benchmark.

rnbradyover 11 years ago

Pretty graphs! Drawn using?

评论 #7020796 未加载

gustafover 11 years ago

Awesome work!

12 comments

noelwelshover 11 years ago

评论 #7020563 未加载

joevandykover 11 years ago

评论 #7020505 未加载

jblowover 11 years ago

评论 #7020601 未加载

评论 #7020588 未加载

babs474over 11 years ago

评论 #7021796 未加载

georgemcbayover 11 years ago

joshwaover 11 years ago

评论 #7020591 未加载

评论 #7020930 未加载

toddhover 11 years ago

评论 #7021201 未加载

jcampbell1over 11 years ago

northisupover 11 years ago

The article does not address why this outsourced heap is better than other outsourced heaps.

ashayhover 11 years ago

rnbradyover 11 years ago

Pretty graphs! Drawn using?

评论 #7020796 未加载

gustafover 11 years ago

Awesome work!

Hammerspace: Persistent, Concurrent, Off-heap Storage

12 comments

Hammerspace: Persistent, Concurrent, Off-heap Storage

12 comments