What kind of instances are you guys running for Redis/memcached? I am a bit surprised on the numbers here, but to be fair I don't do much in the virtualization world. With low cpu overhead, it sounds like you might be saturating the number of interrupts on the network card if it's not a bandwidth issue. Memcache can usually push 100-300k/s on an 8-core Westmere (could go higher if you removed the big lock). Redis on the other hand with pinned processes to each physical core can do about 500,000/s. We (Twitter) saw saturation around 100,000~ on CPU0, what tipped us off was ksoftirq spinning at 100%. If you have a modern server and network card, just pin each IRQ for every TX/RX queue to an individual physical core.
A slight tangent, since I saw that instagram are using both Graphite and Munin- Collectd just added a plugin to send metrics to Graphite. You might want to try it for tracking your machine stats over time.<p><a href="http://collectd.org/wiki/index.php/Plugin:Write_Graphite" rel="nofollow">http://collectd.org/wiki/index.php/Plugin:Write_Graphite</a>
<a href="http://collectd.org/" rel="nofollow">http://collectd.org/</a>
Isn't there a risk with EBS snapshots that the snapshot of a live instance could have been taken while your db engine was in the middle of a transaction and leave the data in the newly spun instance in an inconsistent state?<p>Is it that EBS snapshots are engineered to prevent this? Or just that it's not likely to happen in practice?
Why use Graphite instead of Ganglia? Ganglia uses RRDs. It's been around forever, it's fairly low on resource use, it's fast, and you can generate custom graphs like with Graphite. I actually ended up doing some graphs with google charts and ganglia last time I messed with it. (Also, nobody has really simple tools to tell you which of your 3,000 cluster nodes has red flags in real time and spit them into a fire-fighting irc channel so we had to write those ourselves in python)<p><i>"Takeaway: if read capacity is likely to be a concern, bringing up read-slaves ahead of time and getting them in rotation is ideal"</i><p>Sorry but this is not 'ideal', this is Capacity Planning 101. If you're launching a new product which you expect to be very popular, take your peak traffic and double or quadruple it and build out infrastructure to handle it ahead of time. I thought this was the whole point of the "cloud"? Add a metric shit-ton of resources for a planned peak and dial it down after.
We use statsd, graphite, redis and node as well. You might be interested some of my projects relating to these:<p><a href="https://github.com/gflarity/nervous" rel="nofollow">https://github.com/gflarity/nervous</a>
<a href="https://github.com/gflarity/response" rel="nofollow">https://github.com/gflarity/response</a>
<a href="https://github.com/gflarity/qdis" rel="nofollow">https://github.com/gflarity/qdis</a>
Hello!<p>Question about quality insta-photos on Android.<p>I have JPG from SGS2 - <a href="http://kia4sale.narod.ru/insta/01.jpg" rel="nofollow">http://kia4sale.narod.ru/insta/01.jpg</a><p>This is <a href="http://kia4sale.narod.ru/insta/02.jpg" rel="nofollow">http://kia4sale.narod.ru/insta/02.jpg</a> instaphoto (Earlybird) from Android version<p>This is <a href="http://distilleryimage9.instagram.com/662ade7483ce11e19e4a12313813ffc0_7.jpg" rel="nofollow">http://distilleryimage9.instagram.com/662ade7483ce11e19e4a12...</a> - instaphoto from SGS2 JPG but on iPhone 4.<p>Question: why instaphoto on Android version in blurry?<p>Thanks.
Im curious to know what kind of EC2 instance they are running the master Postgresql on and if they've had any write bottle necks. Im using Postgres for an app, and am worried about running into write issues.
PGFouine is nice, but it needs a major do-over. It would be good written with a plpgsql backend running against database loaded csv log files, so that it could handle huge logs, unlike now.
I am curious to find out why there was a need to develop your own C2DM server - what was lacking in Google's C2DM server? I am a C2DM newbie so pardon my ignorance.