SimpleDB is good for a lot of things, but the unpredictable costs (since you're charged for box usage which varies) and network latency make it unsuitable for logging at scale. (like, what if you get 100's of error messages a second?)<p>We use a decidedly low-tech scheme. We log everything to files on local storage. Then we use a command line tool to periodically sync these to S3. Each instance gets its own folder.<p>When we want to query, we have several options. We can run good old grep or tail on one instance -- or on several instances with pssh. We can use the S3 command line tool to process multiple files on an admin machine. Or we could use Hadoop to mess with files in S3 (haven't done that yet...!)<p>Not as snazzy as having each log record stored in a structured fashion, but it's a lot cheaper than SimpleDB.
I put together a similar logging setup for our servers recently; except I'm using CouchDB for my store and NLog (said servers are C#) for the library. Reasons I'm using Couch instead of SimpleDB:<p>1) REST+JSON API made integrating it super-simple (had to add a single line to NLog to set MIME types to make Couch happy)<p>2) We use SDB in our service; so if SDB blows up, not only would things break, we couldn't easily get at our logs to see why.<p>Biggest drawback I've found to Couch in this context is that it loves to devour disk space; you quickly learn to cron some compaction jobs.
Thanks for your addition. At Peecho, we figured that if SimpleDB goes down, logging would be the least of our troubles. Read here how we tried to maintain our uptime:
<a href="http://www.peecho.com/blog/minimizing-downtime-on-amazon-aws.html" rel="nofollow">http://www.peecho.com/blog/minimizing-downtime-on-amazon-aws...</a><p>--------
Please vote for Peecho in the race for the Accenture Innovation Awards: <a href="http://goo.gl/3xLlC" rel="nofollow">http://goo.gl/3xLlC</a>
Logging seems like the perfect candidate for a relational database. It's a write-heavy system with need for powerful ad-hoc query capabilities. Am I missing something? Why put this in a key-value store?
How do you get around the 10GB/domain limit?<p>What do you do when your fleet gets big enough that you start getting throttled?<p>What's the query latency?<p>How do you delete old log data?