TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A log/event processing pipeline you can't have (2019)

162 点作者 sigil超过 4 年前

6 条评论

fluential超过 4 年前
Great article based on real life experience. I have been building logging and protective monitoring pipelines for a while now.<p>From my experience if it comes to log shipping from hosts rsyslog + relp + disk assited in-memory asynchronous queues are preferred, most of the time you just only have network i&#x2F;o as logs would not touch disk.<p>The idea is to ship logs off the device ASAP as well as destination acts as a sink server capable to handle most of the spikes withouth stressing local source. All done via rsyslog which also wraps actual logs into json format locally. The glue could be syslog-tag.<p>At the other end you could have ELK stack and logstash using json_lines codec input (pretty fast) structuring data further to your likings.<p>Just looking into metrics now the avg time for logs showing in ELK is 7-200ms (the latency comes mostly from specific reads happening against the ES cluster).<p>As ELK is always the slowest component, dropping logs compressed in-memory directly onto disk is also an option.<p>One thing to note is that RELP can produce extra duplicates which are easily handled by inserting into Elasticsearch using specific document ID (some performance penalty) which could be some unique hash computed on (log content, timestamp, host) etc. With this in place you can also easily &quot;replay&quot; stream of logs to fill potential gaps.<p>This type of setup scales really good as well.<p>Edit: typos
评论 #24283146 未加载
评论 #24282778 未加载
user5994461超过 4 年前
Good read, except the part where the author says there are no existing solutions for processing logs. There are quite a few robust scalable ones.<p>syslog-ng, logstash or fluentd on the host to collect and aggregate logs. (logstash&#x2F;fluentd can parse text messages with regex and handle a hundred different things like s3&#x2F;kafka&#x2F;http but they are much more resource intensive).<p>kibana or graylog to centralize logs and search, the storage is elasticsearch.<p>A simple syslog-ng on the devices could probably do the job. Little known fact about syslog, it can reliably forward messages over TCP, logs are numbered, have retries and syslog-ng can do DNS load balancing.
评论 #24281065 未加载
评论 #24308129 未加载
评论 #24283169 未加载
评论 #24281008 未加载
ezekiel68超过 4 年前
Loved this. Great, actionable advice that&#x27;s still applicable over a year later -- and a true geek&#x27;s sense of humor. The hidden contrarian in all of us cheers along with his trials and triumphs. I didn&#x27;t mind the soft-sell final paragraph at all, since he gave away the keys to the kingdom in the rest of the article anyway.
评论 #24281283 未加载
winrid超过 4 年前
Fun read, nice to see some &quot;down to Earth&quot; engineering. :)
gbrown_超过 4 年前
&gt; So, the pages are still around when the system reboots.<p>...<p>&gt; The kernel notices that a previous dmesg buffer is already in that spot in RAM (because of a valid signature or checksum or whatever) and decides to append to that buffer instead of starting fresh.<p>This sounds like it should be very unreliable. Perhaps it works in practice but I couldn&#x27;t see myself relying on such a mechanism.
评论 #24281026 未加载
评论 #24290222 未加载
traceroute66超过 4 年前
It was all making for such an interesting read until the last paragraph.<p>I don&#x27;t know about anyone else, but I have this inherent hatred of company marketing material disguised as blog posts.<p>If you are going to write a decent blog post, then write a decent blog post. If people are curious about the author they can look them up (and their affiliation). Don&#x27;t turn it into a sales pitch.
评论 #24283776 未加载
评论 #24280266 未加载
评论 #24280164 未加载
评论 #24281103 未加载