Did I seriously just read through the very worst kind of fiction, business process fiction, just to be bait and switched with an ad for something that isn’t actually Kafka? I don’t even use Kafka or Kubernetes but I thought this might be an interesting look at the possible failure modes and how to handle them.
We run Kafka with kubernetes, but its so locked down/customized we might as well run it on bare metal. We started out on rook network storage, but after multiple fs corruption incidents, we now run every broker with local storage. Kube still gives us the benefit of re-using our deployment software and many monitoring conventions, so we do get some benefit from kubernetes.<p>The only reason we did not lose data on the rook setup, is we run 5 brokers with replication factor 4, with 2 in sync.
I would recommend looking at Strmzi for running Kafka on K8S. It works well and addresses all the issues mentioned in this piece. There's no reason to switch to RedPanda because you're running Kubernetes.<p><a href="https://strimzi.io/" rel="nofollow">https://strimzi.io/</a>
as some mentioned, there are benefits and there are costs applying the 'lift and shift' for big things like kafka (and elastic search, DBs, etc).
The main assumption is that MOST of your apps (especially the ones affected by kafka access latency) run in k8s already. Access from outside should be mainly for:<p>a. integrations with other systems which do not have very strict latency constraints<p>b. replication to disaster recovery site<p>The benefits are listed (between lines) in both article and below.<p>Price is is usually still treating those pods like pets:<p>- one pod per k8s node (taints&node selectors)
- special sizing & tunings of the targeted nodes (resources, kernel params, etc)<p>- one LB per Pod. yes, costly and against what you would expect, but that's what is required for a bullet proof deploy. (delay is not always there, especially in clouds, LB have super efficient implementations (especially gcp)<p>- bullet proof storage, with the required performance computed in your sizing phase.
I know many caution against doing this kind of thing but what is not to like about the concept? Why can’t we have a single, reliable and well understood “substrate” for which to deploy backend infrastructure for our apps?
TL;DR This is a plug for Red panda.<p>If you're going to look at Kafka inside of Kubernetes at all just look at running Apache Pulsar in K8s.<p>One of the main arguments in this article is worry about exposing the broker but in Pulsar there is a proxy for this and the helm chart supports ingress public endpoint by just enabling a flag.