科技回声

5 条评论

pierrefar超过 15 年前

I was running a website that was doing millions of writes a day to SDB for real time analytics. The biggest PITA feature of SDB is that it throttles writes in a very horrible stringent way. You can barely tickle a domain and it will throttle you. I never had any consistently good batch puts - they all eventually fail.After talking with SDB folks, they recommended that I shard my data because each domain maps to a different network computer cluster. I'm glad it's the first recommendation in the OP list because it seriously is the best thing you can do.Another trick that I experimented with: use multiple EC2 instance to write to the same domain. I managed to convince myself that the throttling is per EC2 instance per domain, not a global per domain. However, cost ruled this solution out.Reading was much more consistent but was also throttled, especially at high write-loads. The solution was two-fold:1. Cache everything "indefinitely" and break the cache when you know its contents will change. For the real time stuff, you can't cache. I used memcached, and looked at other solutions like Tokyo Tyrant, memcachedb and redis. Use what you feel comfortable using really.2. Read as little as possible. Doing a "select * from domain where..." is horrible compared to doing "select attribute1, attribute2 from domain where...". Once you read, cache.

jordyhoyt超过 15 年前

The linked blog post gives more details on how he did it and the throughput he got.<a href="http://practicalcloudcomputing.com/post/284222088/forklift-1b-records" rel="nofollow">http://practicalcloudcomputing.com/post/284222088/forklift-1...</a>Very interesting that Oracle became the bottleneck.

评论 #1057756 未加载

pvg超过 15 年前

The post is notable in its absence of any hint as to why and with what kind of data this was done. Was the driver cost? Performance? A pleasant cloudy feeling? It seems a given you can, if you try, get a billion rows into SimpleDB. You can probably get a billion rows mechanical-turked onto clay tablets. The interesting thing to learn would be why doing so is advantageous.

评论 #1058060 未加载

lsb超过 15 年前

You can buy a machine with 128GB of memory and 2 TB of disk space for under $10k at Dell. A billion rows could be an in-memory dataset now.

评论 #1058078 未加载

elq超过 15 年前

bah. that's nothing! my team put over 3 billion rows into amazons cloud in a matter of hours without having to deal with the vagaries of sdb :)to the best of my knowledge, oracle was the bottleneck because the oracle instance is an actual high volume production database and the IR process was restrained to minimize the impact to production users.

评论 #1058062 未加载

5 条评论

pierrefar超过 15 年前

jordyhoyt超过 15 年前

评论 #1057756 未加载

pvg超过 15 年前

评论 #1058060 未加载

lsb超过 15 年前

You can buy a machine with 128GB of memory and 2 TB of disk space for under $10k at Dell. A billion rows could be an in-memory dataset now.

评论 #1058078 未加载

elq超过 15 年前

评论 #1058062 未加载

How Netflix loaded 1 billion rows into SimpleDB

5 条评论

How Netflix loaded 1 billion rows into SimpleDB

5 条评论