TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How we built ngrok's data platform

164 pointsby samber8 months ago

8 comments

Fripplebubby8 months ago
I found the technical details really interesting, but I think this gem applies more broadly:<p>&gt; I find this is often an artifact of the DE roles not being equipped with the necessary knowledge of more generic SWE tools, and general SWEs not being equipped with knowledge of data-specific tools and workflows.<p>&gt; Speaking of, especially in smaller companies, equipping all engineers with the technical tooling and knowledge to work on all parts of the platform (including data) is a big advantage, since it allows people not usually on your team to help on projects as needed. Standardized tooling is a part of that equation.<p>I have found this to be so true. SWE vs DE is one division where this applies, and I think it also applies for SWE vs SRE (if you have those in your company), data scientists, &quot;analysts&quot;, basically anyone who is in a technical role should ideally know what kinds of problems other teams work on and what kinds of tooling they use to address those problems so that you can cross-pollinate.
评论 #41698669 未加载
1a527dd58 months ago
Blimey, that is a lot of moving parts.<p>Our data team currently has something similar and its costs are astronomical.<p>On the other hand our internal platform metrics are fired at BigQuery [1] and then we used scheduled queries that run daily (looking at the -24 hours) that aggregate&#x2F;export to parquet. And it&#x27;s cheap as chips. From there it&#x27;s just a flat file that is stored on GCS that can be pulled for analysis.<p>Do you have more thoughts on Preset&#x2F;Superset? We looked at both (slightly leaning towards cloud hosted as we want to move away from on-prem) - but ended up going with Metabase.<p>[1] <a href="https:&#x2F;&#x2F;cloud.google.com&#x2F;bigquery&#x2F;docs&#x2F;write-api" rel="nofollow">https:&#x2F;&#x2F;cloud.google.com&#x2F;bigquery&#x2F;docs&#x2F;write-api</a>
评论 #41696814 未加载
评论 #41697027 未加载
zurfer8 months ago
Kudos to the author who is responsible for the whole stack. A lot of effort goes into ingesting data into Iceberg tables to be queried via AWS Athena.<p>But I think it&#x27;s great that analytics and data transformation is distributed, so developers also are somewhat responsible for correct analytical numbers.<p>In most companies there is strong split between building product and maintaining analytics for the product, which leads to all sort of inefficiencies and errors.
tonymet8 months ago
15k&#x2F;s event rate and 650GB volume &#x2F; day is massive. Of course that&#x27;s confidential, but I&#x27;d guess they are below 10k concurrent connections. So they are recording 1.5 event&#x27;s &#x2F; second &#x2F; user. Does every packet need discrete &amp; real-time telemetry? I&#x27;ve seen games with millions of active users only hit 30k concurrents and this is a developer tool.<p>Most events can be aggregated over time with a statistic (count, avg, max, etc). Even discrete events can be aggregated with a 5 min latency. That should reduce their event volume by 90% . Every layer in that diagram is CPU wasted on encode-decode that costs money.<p>The paragraph on integrity violation queries was helpful -- it would be good to understand more of the query and latency requirements.<p>The article is a great technical overview, but it&#x27;s also helpful to discuss whether this system is a viable business investment. Sure they are making high margins, but why burn good cash on something like this?
评论 #41701916 未加载
jmuguy8 months ago
I wonder if this data collection is why Ngrok&#x27;s tunnels are now painfully slow to use. I&#x27;ve just gone back to localhost unless I specifically need to test omniauth or something similar.
valzam8 months ago
i pity the developer who has to maintain tagless final plumbing code after the “functional programming enthusiast” moves on… in a Go first org no less.
评论 #41696641 未加载
评论 #41698603 未加载
评论 #41696370 未加载
moandcompany8 months ago
At the end of the day, we&#x27;re all pushing protobufs from place to place
评论 #41702444 未加载
LoganDark8 months ago
&gt; Note that we do not store any data about the traffic content flowing through your tunnels—we only ever look at metadata. While you have the ability to enable full capture mode of all your traffic and can opt in to this service, we never store or analyze this data in our data platform. Instead, we use Clickhouse with a short data retention period in a completely separate platform and strong access controls to store this information and make it available to customers.<p>Don&#x27;t worry, your sensitive data isn&#x27;t handled by our platform, we ship it to a third-party instead. This is for your protection!<p>(I have no idea if Clickhouse is actually a third party, it sounds like one though?)
评论 #41697363 未加载
评论 #41695027 未加载
评论 #41695104 未加载
评论 #41696743 未加载