TechEcho

9 comments

blindedover 1 year ago

This arch is how the big players do it at scale (ie. datadog, new relic - the second it passes their edge it lands in a kafka cluster). Also otel components lack rate limiting(1) meaning its super easy to overload your backend storage (s3).Grafana has some posts how they softened the s3 blow with memcached(2,3).1. <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/6908">https://github.com/open-telemetry/opentelemetry-collector-co...</a> 2. <a href="https://grafana.com/docs/loki/latest/operations/caching/" rel="nofollow noreferrer">https://grafana.com/docs/loki/latest/operations/caching/</a> 3. <a href="https://grafana.com/blog/2023/08/23/how-we-scaled-grafana-cloud-logs-memcached-cluster-to-50tb-and-improved-reliability/" rel="nofollow noreferrer">https://grafana.com/blog/2023/08/23/how-we-scaled-grafana-cl...</a>I know the post is about telemetry data and my comments on grafana are logs, but the arch bits still apply.

评论 #37982093 未加载

评论 #37983239 未加载

评论 #37981908 未加载

francoismassotover 1 year ago

I heard several times that Kafka was put in front of elasticsearch clusters for handling traffic burst. You can also use Redpanda, Pulsar, NATS and other distributed queues.One thing that is also very interesting with Kafka is that you can achieve exactly-once semantic without too much efforts: by keeping track of the positions of partitions in your own database and carefully acknowledging them when you are sure data is safely stored in your db. That's what we did with our engine Quickwit, so far it's the most efficient way to index data in it.One obvious drawback with Kafka is that it's one more piece to maintain... and it's not a small one.

评论 #37981934 未加载

评论 #37981867 未加载

评论 #37981690 未加载

评论 #37982122 未加载

评论 #37981746 未加载

评论 #37982164 未加载

Joel_Mckayover 1 year ago

If you have distributed concurrent data streams that exhibit coherent temporal events, than at some point you pretty much have to implement a queuing balancer.One simply trades latency for capacity and eventual coherent data locality.Its almost a arbitrary detail whether you use Kafka, RabbitMQ, or Erlang channels. If you can add smart client application-layer predictive load-balancing, than it is possible to cut burst traffic loads by a magnitude or two. Cost optimized Dynamic host scaling is not always a solution that solves every problem.Good luck out there =)

chris_armstrongover 1 year ago

A similar idea [^1] has cropped up in the serverless OpenTelemetry world to collate OpenTelemetry spans in a Kinesis stream before forwarding them to a third-party service for analysis, obviating the need for a separate collector, reducing forwarding latency and removing the cold-start overhead of the AWS Distribution for OpenTelemetry Lambda Layer.[^1] <a href="https://x.com/donkersgood/status/1662074303456636929?s=20" rel="nofollow noreferrer">https://x.com/donkersgood/status/1662074303456636929?s=20</a>

bushbabaover 1 year ago

Seems like overkill no? Otel collectors are fairly cheap, why add expensive Kafka into the mix. If you need to buffer why not just dump to s3 or similar data store as a temporary storage array.

评论 #37979976 未加载

评论 #37979547 未加载

评论 #37981885 未加载

评论 #37979617 未加载

评论 #37982198 未加载

nicognawover 1 year ago

Signoz is too good at SEO.Early days, I looked up otel and observability stuff, and I always saw Signoz articles on the first screen.

评论 #37984582 未加载

daurnimatorover 1 year ago

I expect it would be far cheaper to scale up tempo/loki than it would be to even run an idle kafka cluster. This feels like spending thousands of dollars to save tens of dollars.

评论 #37981849 未加载

评论 #37981045 未加载

评论 #37982217 未加载

评论 #37981701 未加载

评论 #37992332 未加载

anacrolixover 1 year ago

Are there any client side dynamic samplers that can target a maximum event rate? Burstiness with otel has been a thorn in everything that uses it from my experience and it's frustrating.

评论 #37994797 未加载

nijaveover 1 year ago

It'd be nice to have something simpler as an otel processor. Otel could just dump events to local disk as sequential writes then read them back, load permitting.I'm curious how long things stay in Kafka on average and worse case. If it's more than a few minutes, I imagine it lowers the quality of tail based sampling.

9 comments

blindedover 1 year ago

评论 #37982093 未加载

评论 #37983239 未加载

评论 #37981908 未加载

francoismassotover 1 year ago

评论 #37981934 未加载

评论 #37981867 未加载

评论 #37981690 未加载

评论 #37982122 未加载

评论 #37981746 未加载

评论 #37982164 未加载

Joel_Mckayover 1 year ago

chris_armstrongover 1 year ago

bushbabaover 1 year ago

Seems like overkill no? Otel collectors are fairly cheap, why add expensive Kafka into the mix. If you need to buffer why not just dump to s3 or similar data store as a temporary storage array.

评论 #37979976 未加载

评论 #37979547 未加载

评论 #37981885 未加载

评论 #37979617 未加载

评论 #37982198 未加载

nicognawover 1 year ago

Signoz is too good at SEO.Early days, I looked up otel and observability stuff, and I always saw Signoz articles on the first screen.

评论 #37984582 未加载

daurnimatorover 1 year ago

I expect it would be far cheaper to scale up tempo/loki than it would be to even run an idle kafka cluster. This feels like spending thousands of dollars to save tens of dollars.

评论 #37981849 未加载

评论 #37981045 未加载

评论 #37982217 未加载

评论 #37981701 未加载

评论 #37992332 未加载

anacrolixover 1 year ago

Are there any client side dynamic samplers that can target a maximum event rate? Burstiness with otel has been a thorn in everything that uses it from my experience and it's frustrating.

评论 #37994797 未加载

nijaveover 1 year ago

OpenTelemetry at Scale: Using Kafka to handle bursty traffic

9 comments

OpenTelemetry at Scale: Using Kafka to handle bursty traffic

9 comments