Ask HN: Logstash vs. Kafka?

3 pointsby albertlieabout 7 years ago

Hi all,I'm trying to build realtime data infrastructure for logging. For ingestion layer, I'm thinking about using Kafka / Logstash for logging ingestion layer and then after that I can store it in any database and I can easily change the store without changing the ingestion layer in the future.Any experience using logstash or Kafka in production?Also, additional question is, I concern a lot for missing data case when we're sending data to logstash or kafka using kind of lightweight shipper like filebeat? Any experience for handling missing data at scale?

2 comments

dozzieabout 7 years ago

Are you really ready to forego hundreds of megabytes of RAM for merely log shipping? Fluentd could be a cheaper alternative to JVM-based routers.Also, what exactly is your question?> Any experience for handling missing data at scale?For logs? Missing logs don't matter (unless they're a required audit data). Your system should be prepared not to fall apart on missing hours or days of logs, similar to how it should treat missing metrics and other monitoring data.And what volume is "at scale"?

评论 #16600536 未加载

评论 #16600588 未加载

htnabout 7 years ago

We're using Kafka as a log delivery platform and are quite happy with it. Kafka by nature is highly available and can be scaled quite trivially with the log load by adding new cluster nodes.We've decided to use journald for storing all of our application data. We pump the entries from journald to Kafka by using a tool that we open sourced: <a href="https://github.com/aiven/journalpump" rel="nofollow">https://github.com/aiven/journalpump</a>.From Kafka, we export the logs to Elasticsearch for viewing and analysis. Some specific logs are also stored in S3 for long term storage for e.g. audit purposes.