TechEcho

Hi HN, I’m Nate, and here together with my co-founder Matt, we are the founders of ContainIQ (<a href="https://www.containiq.com/" rel="nofollow">https://www.containiq.com/</a>). ContainIQ is a complete K8s monitoring solution that is easy to set up and maintain and provides a comprehensive view of cluster health.Over the last few years, we noticed a shift that more of our friends and other founders were using Kubernetes earlier on. (Whether or not they actually need it so early is not as clear, but that’s a point for another discussion.) From our past experience using open-source tooling and other platforms on the market, we knew that the existing tooling out there wasn’t built for this generation of companies building with Kubernetes.Many early to middle-market tech companies don’t have the resources to manage and maintain a bunch of disparate monitoring tools, and most engineering teams don’t know how to use them. But when scaling, engineering teams do know that they need to monitor cluster health and core metrics, or else end users will suffer. Measuring HTTP response latency by URL path, in particular, is important for many companies, but can be time-consuming to install application packages for each individual microservice.We decided to build a solution that was easy to set up and maintain. Our goal was to get users 95% of the way there almost instantly.Today, our Kubernetes monitoring platform has four core features: (1) metrics: CPU and memory for pods/nodes, view limits, capacity, and correlate to events, alert on changes; (2) events: K8s events dashboard, correlate to logs, alerts; (3) latency: monitor RPS, p95, and p99 latencies by microservices, including by URL path, alerts; and (4) logs: container level log storage and search.Our latency feature set was built using a technology called eBPF. BPF, or the Berkeley Packet Filter, was developed from a need to filter network packets in order to minimize unnecessary packet copies from the kernel space to the user space. Since version 3.18, the Linux kernel provides extended BPF, or eBPF, which uses 64-bit registers and increases the number of registers from two to ten. We install the necessary kernel headers for users automatically.With eBPF, we are monitoring from the kernel and OS level, and not at the application level. Our users can measure and monitor HTTP response latency across all of their microservices and URL paths, as long as their kernel version is supported. We are able to deliver this experience immediately by parsing the network packet from the socket directly. We then correlate the socket and sk_buff information to your Kubernetes pods to provide metrics like requests per second, p95, and p99 latency at the path and microservice level, without you having to instrument each microservice at the application level. For example with ContainIQ, you can track how long your node.js application is taking to respond to HTTP requests from your users, ultimately allowing you to see which parts of your web application are slowest and alerting you when users are experiencing slowdowns.Users can correlate events to logs and metrics in one view. We knew how annoying it was to toggle between multiple tabs and then scroll endlessly through logs trying to match up timestamps. We fixed this. For example, a user can click from an event (ex a pod dying) to the logs at that point in time.Users can set alerts across really all data points (ex. p95 latency, a K8s job failing, a pod eviction).Installation is straightforward either using helm or with our YAML files.Pricing is $20 per node / month + $1 per GB of log data ingested. You can sign up on our website directly with the self-service flow. You can also book a demo if you would like to talk to us, but that isn’t required. Here are some videos (<a href="https://www.containiq.com/kubernetes-monitoring" rel="nofollow">https://www.containiq.com/kubernetes-monitoring</a>) if you are curious to see our UX before signing up.We know that we have a lot of work left to do. And we welcome your suggestions, comments, and feedback. Thank you!

7 comments

rlyshwover 3 years ago

I recently had an issue where my UDP service worked fine exposed directly as a NodePort type, but not through an nginx UDP ingress. I _think_ the issue was that the ingress controller forwarding operation was just too slow for the service's needs, but I had no way of really knowing.Now if I had this kernel level network monitoring system, I probably could have had a clearer picture as to what is going on.Really one of the hardest problems I've had with learning/deploying in k8s is trying to trace down the multiple levels of networking, from external TLS termination to LoadBalancers, through ingress controllers, all the way down to application-level networking, I've found more often than not the easiest path is to just get rid of those layers of complexity completely.In the end I just exposed my server on NodePort, forwarded my NAT to it, and called it done. But it sounds like something like ContainIQ can really add to a k8s admin's toolset for troubleshooting these complex network issues. I also agree with other comments here that a limited, personal-use/community tier would be great for wider adoption and home-lab users like me :)

评论 #29830416 未加载

gigatexalover 3 years ago

A community edition/non-paid would be quite nice to be able to trial this out before paying.This is how an old employer adopted CockroachDB because we trialed the non-enterprise version and then ultimatley bought a license.

评论 #29826013 未加载

评论 #29826772 未加载

nodesocketover 3 years ago

Hello. I own and run a DevOps consulting company and use DataDog exclusively for clients. DD works pretty well as it integrates with cloud providers (such as AWS), physical servers (agent), and Kubernetes (helm chart). The pain point is still creating all the custom dashboards, alerts, and DataDog integrations and configuration. Managing the DataDog account can almost be a full-time job for somebody. Especially with clients who have lots of independent k8s clusters all in a single DD account (lots of filtering on tags and labels).What does ContainIQ offer in terms of benefits over well established players like DataDog? I will say, the Traefik DataDog integration is horrible and hasn't been updated in years so that's something I wish was better. DataDog does support Kubernetes events (into the feed), and their logging offering is quite good (though very expensive).

评论 #29828125 未加载

kolanosover 3 years ago

How does this compare to Pixie? [0][0]: <a href="https://github.com/pixie-io/pixie" rel="nofollow">https://github.com/pixie-io/pixie</a>

评论 #29826955 未加载

评论 #29827559 未加载

nyellinover 3 years ago

Nice to see a new eBPF based solution out there. Good luck.

评论 #29830421 未加载

MoSattlerover 3 years ago

How does this compare to Opstrace? [0][0]: <a href="https://opstrace.com" rel="nofollow">https://opstrace.com</a>

评论 #29827728 未加载

Kletiomdmover 3 years ago

Gcp wants 50 cent per ingested log gb.Gcp is already quite expensive in this regard and you want double.I think that's way to expensive.

7 comments

rlyshwover 3 years ago

评论 #29830416 未加载

gigatexalover 3 years ago

评论 #29826013 未加载

评论 #29826772 未加载

nodesocketover 3 years ago

评论 #29828125 未加载

kolanosover 3 years ago

How does this compare to Pixie? [0][0]: <a href="https://github.com/pixie-io/pixie" rel="nofollow">https://github.com/pixie-io/pixie</a>

评论 #29826955 未加载

评论 #29827559 未加载

nyellinover 3 years ago

Nice to see a new eBPF based solution out there. Good luck.

评论 #29830421 未加载

MoSattlerover 3 years ago

How does this compare to Opstrace? [0][0]: <a href="https://opstrace.com" rel="nofollow">https://opstrace.com</a>

评论 #29827728 未加载

Kletiomdmover 3 years ago

Gcp wants 50 cent per ingested log gb.Gcp is already quite expensive in this regard and you want double.I think that's way to expensive.

Launch HN: ContainIQ (YC S21) – Kubernetes Native Monitoring with eBPF

7 comments

Launch HN: ContainIQ (YC S21) – Kubernetes Native Monitoring with eBPF

7 comments