TechEcho

8 comments

snewmanabout 10 years ago

Hi! Great to see this pop back up on HN. I'm the author of the blog post (and Scalyr founder), happy to answer any questions.Downthread, someone mentioned that they couldn't find the HN discussion from when this was originally posted; it's here:<a href="https://news.ycombinator.com/item?id=7715025" rel="nofollow">https://news.ycombinator.com/item?id=7715025</a>

hglaserabout 10 years ago

Great post.BTW, this product (Scalyr) is a lifesaver. We (Periscope) are able to operate ~ a dozen heterogeneous servers with no FT DevOps largely because of Scalyr.

twotwotwoabout 10 years ago

Lots of attention goes to OLTP-type loads for good reasons, but when you do design to just stream fast, some fun things happen:You can use lots of relatively cheap spindles in parallel, and think of each one as (at least) 100MB/s of sequential read speed and a couple TB of space. You have fast compression available that can increase your effective bandwidth and make the effective cost of space cheaper.You can draw on well-understood ways to search, sort, do hash- or sort-based joining and grouping, and so on.Streaming doesn't need a big in-memory cache to avoid disk seeks, so you can use those gobs of RAM for other things--aggregating results or holding data to join against, say. (Of course, if you don't need the RAM, disk cache might still be useful for some access patterns.)Besides log search, you see a stream-fast approach in analytics-focused DBs: BigQuery, Redshift, Vertica, and open-source ones--Facebook put up a good post about the work that led to the their Hive ORCFile design.Some bioinformatics tools load a big hashtable into memory and, roughly, hash-join against a ton of raw data streamed from disk, then sometimes then repeat the process with another hashtable.These are not at all original observations, but I managed to hear about these sorts of analytics and bioinformatics tools for a while before really getting how or why why they did things all that differently from a typical random-access-oriented database.

imaginenoreabout 10 years ago

I have another idea for you guys. Instead of relying on expensive AWS SSD instances, why not switch to Hetzner, and keep everything in RAM?128 GB RAM for $135/month:<a href="https://www.hetzner.de/en/hosting/produkte_rootserver/px120" rel="nofollow">https://www.hetzner.de/en/hosting/produkte_rootserver/px120</a>And you will have so much extra disk space, you can use it for backups. Or even resell it.Your i2.4xlarge cost you $2,455/month.

评论 #9207821 未加载

imaginenoreabout 10 years ago

I wonder why they chose Java for substring search. Why not C (strstr) or grep?<a href="http://www.arstdesign.com/articles/fastsearch.html" rel="nofollow">http://www.arstdesign.com/articles/fastsearch.html</a>

评论 #9204447 未加载

lostmsuabout 10 years ago

Nice. But does not scale.

swatowabout 10 years ago

Judging from the comments, this article was written around May 8 2014. Can we get a (2014) in the title?

评论 #9204404 未加载

kiallmacinnesabout 10 years ago

The linked article has been posted before, I can't find the old HN thread.. But it was certainly worth a re-read :)I wonder has scalyr reached their expected 100GB/s yet?

评论 #9204108 未加载

8 comments

snewmanabout 10 years ago

hglaserabout 10 years ago

Great post.BTW, this product (Scalyr) is a lifesaver. We (Periscope) are able to operate ~ a dozen heterogeneous servers with no FT DevOps largely because of Scalyr.

twotwotwoabout 10 years ago

imaginenoreabout 10 years ago

评论 #9207821 未加载

imaginenoreabout 10 years ago

评论 #9204447 未加载

lostmsuabout 10 years ago

Nice. But does not scale.

swatowabout 10 years ago

Judging from the comments, this article was written around May 8 2014. Can we get a (2014) in the title?

评论 #9204404 未加载

kiallmacinnesabout 10 years ago

The linked article has been posted before, I can't find the old HN thread.. But it was certainly worth a re-read :)I wonder has scalyr reached their expected 100GB/s yet?

评论 #9204108 未加载

Searching 20 GB/sec: Systems Engineering Before Algorithms (2014)

8 comments

Searching 20 GB/sec: Systems Engineering Before Algorithms (2014)

8 comments