TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Infrastructure for Data Streams

75 pointsby tadasvover 10 years ago

7 comments

eikenberryover 10 years ago
From everything I&#x27;ve read Kafka is a really bad fit for AWS. It is not tolerant of partitioning. They stated this in their own design document where they present it as a CA system. In his Jepsen post on Kafka, Kyle backed this up with more data.<p>Given this, why do people deploy it to AWS? It seems like an invitation to disaster.
评论 #8589154 未加载
评论 #8588945 未加载
评论 #8588732 未加载
nostrademonsover 10 years ago
Curious whether Cap&#x27;n Proto or another zero-copy serialization format might&#x27;ve been a better choice than protobufs? Protobufs still need to parse the message, it&#x27;s just that the code to do so is automatically generated for you. With Cap&#x27;n Proto you can just read them directly off the wire and save them, or mmap a file full and access them.<p>Most of the downsides of Cap&#x27;n Proto also don&#x27;t apply here. Compressing with Snappy will elide all the zero-valued padding bytes. The format of an HTTP message is relatively stable, so you don&#x27;t get a lot of churn in the message layout. HTTP doesn&#x27;t have a lot of optional fields, so that&#x27;s another potential source of Cap&#x27;n Proto bloat that doesn&#x27;t apply to your use case.
felipesabinoover 10 years ago
My lazy self always wonder how nice it would be if some of these infrastructure designs were always accompanied with a docker&#x2F;fig configuration example to be used as a start point&#x2F;proof of concept for people looking for similar solutions.<p>It obviously happens some times [1] [2], but it should be more common...<p>[1] <a href="http://alvinhenrick.com/2014/08/18/apache-storm-and-kafka-cluster-with-docker/" rel="nofollow">http:&#x2F;&#x2F;alvinhenrick.com&#x2F;2014&#x2F;08&#x2F;18&#x2F;apache-storm-and-kafka-cl...</a><p>[2] <a href="https://registry.hub.docker.com/u/ches/kafka/" rel="nofollow">https:&#x2F;&#x2F;registry.hub.docker.com&#x2F;u&#x2F;ches&#x2F;kafka&#x2F;</a>
评论 #8589931 未加载
评论 #8588724 未加载
zeropover 10 years ago
We use netty for transport in similar scenario. Though we have not hard-tested it with the limits mentioned but wouldn&#x27;t a write-behind cache can write large volume of data..ofcourse there will be a delay but it is not hard to implement.
eva1984over 10 years ago
Just curious how does Kafka handle data rentention though? Can it be easily configured? Or you need to build something from scratch?
评论 #8589942 未加载
评论 #8588693 未加载
hbzover 10 years ago
I was hoping he&#x27;d post the http-to-kafka adapter but I&#x27;m guessing that&#x27;s ChartBeat IP.
评论 #8589878 未加载
suchitpuriover 10 years ago
One thing which is not clear about kafka or kinesis is when you have multiple consumers for the same topic how will they get the data and in what order , and what happens when consumers die down. How do you handle consumers in your data pipeline ?
评论 #8588995 未加载
评论 #8588984 未加载