A new JSON data type for ClickHouse

382 pointsby markhneedham7 months ago

17 comments

ramraj077 months ago

Great to see it in ClickHouse.Snowflake released a white paper before its IPO days and mentioned this same feature (secretly exploding JSON into columns). Explains how snowflake feels faster than it should, they’ve secretly done a lot of amazing things and just offered it as a polished product like Apple.

评论 #41921821 未加载

评论 #41921925 未加载

评论 #41921240 未加载

maccard7 months ago

I've heard wonderful things about ClickHouse, but every time I try to use it, I get stuck on "how do I get data into it reliably". I search around, and inevitably end up with "by combining clickhouse and Kafka", at which point my desire to keep going drops to zero.Are there any setups for reliable data ingestion into Clickhouse that don't involve spinning up Kafka & Zookeeper?

评论 #41949591 未加载

评论 #41923481 未加载

评论 #41923746 未加载

评论 #41925625 未加载

评论 #41925600 未加载

评论 #41923697 未加载

评论 #41925850 未加载

评论 #41959069 未加载

评论 #41925294 未加载

评论 #41924157 未加载

评论 #41944469 未加载

评论 #41924781 未加载

评论 #41924768 未加载

评论 #41924806 未加载

评论 #41923441 未加载

评论 #41926966 未加载

评论 #41926574 未加载

评论 #41925731 未加载

评论 #41925126 未加载

everfrustrated7 months ago

>Dynamically changing data: allow values with different data types (possibly incompatible and not known beforehand) for the same JSON paths without unification into a least common type, preserving the integrity of mixed-type data.I'm so excited for this! One of my major bug-bears with storing logs in Elasticsearch is the set-type-on-first-seen-occurrence headache.Hope to see this leave experimental support soon!

评论 #41918984 未加载

abe947 months ago

We've been waiting for more JSON support for Clickhouse - the new type looks promising - and the dynamic column, and no need to specifcy subtypes is particularly helpful for us.

breadwinner7 months ago

If you're evaluating ClickHouse take a look at Apache Pinot as well. ClickHouse was designed for single-machine installations, although it has been enhanced to support clusters. But this support is lacking, for example if you add additional nodes it is not easy to redistribute data. Pinot is much easier to scale horizontally. Also take a look at star-tree indexes of Pinot [1]. If you're doing multi-dimensional analysis (Pivot table etc.) there is a huge difference in performance if you take advantage of star-tree.[1] <a href="https://docs.pinot.apache.org/basics/indexing/star-tree-index" rel="nofollow">https://docs.pinot.apache.org/basics/indexing/star-tree-inde...</a>

评论 #41918255 未加载

评论 #41919381 未加载

评论 #41927123 未加载

评论 #41917746 未加载

notamy7 months ago

Clickhouse is great stuff. I use it for OLAP with a modest database (~600mil rows, ~300GB before compression) and it handles everything I throw at it without issues. I'm hopeful this new JSON data type will be better at a use-case that I currently solve with nested tuples.

评论 #41919882 未加载

评论 #41917468 未加载

CSDude7 months ago

When I tried it a few weeks ago, because ClickHouse names the files based on column names, weird JSON keys resulted in very long filenames and slashes and it did not play well with it the file system and gave errors, I wonder that is fixed?

评论 #41921812 未加载

Thorrez7 months ago

>For example, if we have two integers and a float as values for the same JSON path a, we don’t want to store all three as float values on diskWell, if you want to do things exactly how JS does it, then storing them all as float is correct. However, The JSON standard doesn't say it needs to be done the same way as JS.

评论 #41925063 未加载

kreetx7 months ago

This seems similar to instead of storing any specific part (int, string, array) of JSON, just store any JSON type in the column, much like "enum with fields" in Swift, Kotlin or Rust, or algebraic data types in Haskell - a feature not present in many other languages.

jojohohanon7 months ago

I’m a few years removed, but isn’t this how google capacitor stores protobufs (which are ~ equivalent to json in what they can express)?

jakozaur7 months ago

Looks like Snowflake was the first popular warehouse to have variant type which could put JSON values into separate columns.It turned out great idea which inspired other databases.

karsinkk7 months ago

Oracle 23ai also has a similar feature that "explodes" JSON into relational tables/columns for storage while still providing JSON based access API's : <a href="https://www.oracle.com/database/json-relational-duality/" rel="nofollow">https://www.oracle.com/database/json-relational-duality/</a>

officex7 months ago

Great to see! I remember checking you guys out in Q1, great team

fuziontech7 months ago

Using ClickHouse is one of the best decisions we've made here at PostHog. It has allowed us to scale performance all while allowing us to build more products on the same set of data.Since we've been using ClickHouse long before this JSON functionality was available (or even before the earlier version of this called `Object('json')` was avaiable) we ended up setting up a job that would materialize json fields out of a json blob and into materialized columns based on query patterns against the keys in the JSON blob. Then, once those materialized columns were created we would just route the queries to those columns at runtime if they were available. This saved us a _ton_ on CPU and IO utilization. Even though ClickHouse uses some really fast SIMD JSON functions, the best way to make a computer go faster is to make the computer do less and this new JSON type does exactly that and it's so turn key!<a href="https://posthog.com/handbook/engineering/databases/materialized-columns">https://posthog.com/handbook/engineering/databases/materiali...</a>The team over at ClickHouse Inc. as well as the community behind it moves surprisingly fast. I can't recommend it enough and excited for everything else that is on the roadmap here. I'm really excited for what is on the horizon with Parquet and Iceberg support.

评论 #41923069 未加载

baq7 months ago

Clickhouse is criminally underused.It's common knowledge that 'postgres is all you need' - but if you somehow reach the stage of 'postgres isn't all I need and I have hard proof' this should be the next tech you look at.Also, clickhouse-local is rather amazing at csv processing using sql. Highly recommended for when you are fed up with google sheets or even excel.

评论 #41916879 未加载

评论 #41917011 未加载

评论 #41916384 未加载

评论 #41917443 未加载

评论 #41916459 未加载

peteforde7 months ago

I admit that I didn't read the entire article in depth, but I did my best to meaningfully skim-parse it.Can someone briefly explain how or if adding data types to JSON - a standardized grammar - leaves something that still qualifies as JSON?I have no problem with people creating supersets of JSON, but if my standard lib JSON parser can't read your "JSON" then wouldn't it be better to call it something like "CH-JSON"?If I am wildly missing something, I'm happy to be schooled. The end result certainly sounds cool, even though I haven't needed ClickHouse yet.

评论 #41923661 未加载

评论 #41921350 未加载

评论 #41922493 未加载

评论 #41921163 未加载

评论 #41925083 未加载

评论 #41922210 未加载

anonygler7 months ago

I keep misreading this company as ClickHole and expecting some sort of satirical content.