TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Dassana. JSON-native,schema-less logging solution built atop ClickHouse

25 点作者 gauravphoenix大约 3 年前
Hello HN, I’m Gaurav. Founder &amp; CEO of Dassana. We are coming out of stealth today and would like to invite the community to give us a try. https:&#x2F;&#x2F;lake.dassana.io&#x2F;<p>First, a bit of a backstory. I grew up with grep to search log files. The kind of person whose grep was aliased to <i>grep -i</i>. Then came along Splunk. It was a game-changer. For every single start-up I started (there are a few) I used Splunk and quite often we will run out of our ingestion quota. SumoLogic wasn’t cheaper either so we looked into DataDog. It was good until we started running issues with aggregate queries (facets etc), rehydration takes forever and the overall query experience is not fun (it wasn’t fun with Splunk and SumoLogic either).<p>All these experiences over the last two decades led me to wish for a simple solution where I can just throw a bunch of JSON&#x2F;CSV data and query it with simple SQL. These days most logs are structured to begin with and the complexity of parsing logs to extract fields etc has moved to log shippers such as fluentd, logstash etc.<p>Enter HackerNews and ClickHouse.<p>I first learned about ClickHouse from HackerNews and was completely floored by its performance. Given its performance and storage savings due to columnar storage, it was an obvious choice to build a logging solution on top of it. As we started doing POC with it, it was obvious that it is a perfect solution for us if we could solve the problem of schema management. Over the last six months or so, that’s what we have working on. We designed a storage scheme that flattens the JSON objects and exposes an SQL interface that takes a SQL and converts it to our schemaless table query.<p>Being JSON native, we allow querying specific JSON objects in arrays. This is something that is not possible with many logging vendors and if you use something like Athena good luck figuring out the query- it is possible but quite complicated. Here is sample query - select count(distinct eventName) from aws_cloudtrail where awsRegion=us-east-1<p>Also, there are no indices, fields, facets etc in Dassana. You just send JSON&#x2F;CSV logs and you query them with 0 latency. And yes, we do support distributed joins among different data sources (we call them apps). And like any other distributed system, it has limitations but it generally works great for almost all log-related use cases.<p>One amazing side effect of what we built is that we can offer a unique pricing model that is a perfect match for logging data. Generally speaking, log queries tend to be specific. There is always some sort of a predicate- a user name, hostname, an IP address. But these queries run over large volumes of data. As such, these queries run insanely fast on our system and we are able to charge separately for queries and reduce the cost of ingestion dramatically. In general, we expect our solution to be about 10x cheaper (and 10x faster) than other logging systems.<p>When not to use Dassana? Not suitable for unstructured data. We don’t offer full-text-search (FTS) yet. We are more like a database for logs than a lucence index for text files. With more and more people starting to use structured logs, this problem with either go away on its own but as I said, we do plan to offer FTS in the future. Note that you can already use log shippers such as fluent, vector,logstash etc to give structure to logs.<p>What’s next? 1. Grafana plugin. Here is a sneak preview- https:&#x2F;&#x2F;drive.google.com&#x2F;file&#x2F;d&#x2F;1JKnX5Aa6cp_pYnMiFzAojA24bjUn28WM&#x2F;view?usp=sharing<p>2. Alerting&#x2F;Slack notifications. You will be able to save queries and get Slack notifications when results match.<p>3. JDBC driver.<p>4. TBD. You tell us what to build. Email me and I will personally follow up with you: gk @ dassana dot input&#x2F;output<p>I will be online all day today happy to answer any question. Feel free to reach out by email too.

5 条评论

the-alchemist大约 3 年前
* Who do you see as your competition? AWS&#x27;s CloudWatch &#x2F; Centralized Logging? Splunk? GCP&#x27;s Logging? Logstash? Graylog?<p>* What kind of query language are you thinking? I imagine SQL-like, as that&#x27;s Clickhouse&#x27;s native language.<p>* Business-wise, how are you gonna integrate with the cloud providers, AWS &#x2F; GCP &#x2F; Azure? Most people who use those services just use the built-ins.<p>* More than Grafana, I think you need something like Metabase integrated OOTB. That might be a killer feature.<p>* IMHO, FTS is a must-have from day 1. Most software that folks run produce non-structured logs OOTB (sad, I know), so folks won&#x27;t even be able to try your service without changing their software. And getting a lot of software, even popular ones like Python&#x2F;Flask, Ruby&#x2F;Rails, Java&#x2F;Spring, to produce structured logs is not a simple task.<p>Best of luck!!
评论 #31115771 未加载
评论 #31133426 未加载
mritchie712大约 3 年前
Are you using the new JSON column type released in clickhouse 22.3?<p><a href="https:&#x2F;&#x2F;clickhouse.com&#x2F;blog&#x2F;clickhouse-22-3-lts-released&#x2F;" rel="nofollow">https:&#x2F;&#x2F;clickhouse.com&#x2F;blog&#x2F;clickhouse-22-3-lts-released&#x2F;</a>
评论 #31115675 未加载
peapod91大约 3 年前
How do you compare to <a href="https:&#x2F;&#x2F;betterstack.com&#x2F;logtail" rel="nofollow">https:&#x2F;&#x2F;betterstack.com&#x2F;logtail</a> which also seems to be built on Clickhouse?
评论 #31117590 未加载
shaeqahmed大约 3 年前
Cool product and pricing model<p>&gt; Cloud Log Lake<p>That&#x27;s the first time I&#x27;m hearing a Clickhouse backend described as a lake. Care to explain?
评论 #31116948 未加载
pachico大约 3 年前
I&#x27;m wondering if all these logging solutions that don&#x27;t offer traces have any kind of future.
评论 #31115794 未加载