During many years of operating several-thousands of nodes production clusters on Kubernetes, I've never seen any of these observability tools that query kube-apiserver work at that scale. Even the popular tools like k9s make super expensive queries like listing all pods in the cluster that if you don't have enough load protections, can tip your Kubernetes apiserver over and cause an incident. If you're serious about these querying capabilities, I highly recommend building your own data sources (e.g. watch objects with a controller and dump the data in a sql db) and stop hitting apiserver for these things. You'll be better off in the long run.
I'm not against replacing jq/jsonpath for the right tool, they're not the most ergonomic. What isn't clear to me though is why this isn't SQL? It's so nearly SQL, and seems to support almost identical semantics. I realise SQL isn't perfect, but the goal of this project isn't (I assume) to invent a new query language, but to make Kubernetes more easily queryable.
This looks great for scripting. I will say that the query language looks a bit too verbose for daily use — meaning when you're interacting with a cluster to diagnose a problem, follow a job, testing the rollout of something experimental, or similar.<p>For example, I'd love to be able to just do this as the whole query:<p><pre><code> metadata.name =~ "foo%"
</code></pre>
or maybe:<p><pre><code> .. =~ "foo%" // Any field matches
</code></pre>
or maybe:<p><pre><code> $pod and metadata.name =~ "foo%" // Shorthand to filter by type
</code></pre>
I think a query language for querying Kubernetes ought to start with predicate-based filtering as the foundation. Having graph operators seems like a nice addition, but maybe not the first thing people generally need?<p>It's not quite clear who this tool is for, so maybe this is not the intended purpose?
This is fantastic. I’ve always enjoyed the cypher language that the neo4j team created for querying graph data. The connected k8s api objects seem like a great place to apply that lens.
I really really like Steampipe to do this kind of query: <a href="https://steampipe.io" rel="nofollow">https://steampipe.io</a>, which is essentially PostgreSQL (literally) to query many different kind of APIs, which means you have access to all PostgreSQL's SQL language can offer to request data.<p>They have a Kubernetes plugin at <a href="https://hub.steampipe.io/plugins/turbot/kubernetes" rel="nofollow">https://hub.steampipe.io/plugins/turbot/kubernetes</a> and there are a couple of things I really like:<p>* it's super easy to request multiple Kubernetes clusters transparently: define one Steampipe "connection" for each of your clusters + define an "aggregator" connection that aggregates all of them, then query the "aggregator" connection. You will get a "context" column that indicates which Kubernetes cluster the row came from.
* it's relatively fast in my experience, even for large result sets. It's also possible to configure a caching mechanism inside Steampipe to speed up your queries
* it also understands custom resource definitions, although you need to help Steampipe a bit (explained here: <a href="https://hub.steampipe.io/plugins/turbot/kubernetes/tables/kubernetes_%7Bcustom_resource_singular_name%7D" rel="nofollow">https://hub.steampipe.io/plugins/turbot/kubernetes/tables/ku...</a>)<p>Last but not least: you can of course join multiple "plugins" together. I used it a couple of times to join content exposed only in GCP with content from Kubernetes, that was quite useful.<p>The things I don't like so much but can be lived with:<p>* Several columns are just exposed a plain JSON fields ; you need to get familiar with PostgreSQL JSON operators to get something useful. There's a page in Steampipe's doc to explain how to use them better.
* Be familiar also with PostgreSQL's common table expressions: there are not so difficult to use but makes the SQL code much easier to read
* It's SQL, so you have to know which columns you want to pick before selecting the table they come from ; not ideal from autocompletion
* the Steampipe "psql" client is good, but sometimes a bit counter intuitive ; I don't have specific examples but I have the feeling it behaves slightly differently than other CLI client I used.<p>All in all: I think Steampipe is a cool tool to know about, for Kubernetes but also other API systems.
our project <a href="https://github.com/stackql/stackql">https://github.com/stackql/stackql</a> has a k8s provider which might be of interest here, we implement our own front end SQL parser and expose all control plane routes (and data plane routes in many cases) through overloaded SQL methods, this is not FDW based and does not require a server (postgres etc)
This is way cool. The ability to visualize the k8s object model as a graph and query it as such makes so much sense!
The hottest feature in my mind is applying this in an operator - maintaining state as defined by a simple graph query. It is much more readable, and does so with very little code.
Well Done!
since cyper-based (instead of sql), is the key question whether my k8s data is more graph-like or relational?<p>adjacent but lots of experts here - independent of Cyphernetes or specific tooling, what are you doing to secure k8s api / kubectl / k8s control plane?
I dunno, Kubernetes has a query language, it's called jq. As in, kubectl get pods -A -ojson | jq -r '.items[] | ...'. Cyphernetes seems simpler perhaps but it's not the 10x improvement I need to switch and introduce a new dependency.