TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How Palo Alto Networks Replaced Kafka with ScyllaDB for Stream Processing

126 pointsby cyndunlopalmost 3 years ago

13 comments

EdwardDiegoalmost 3 years ago
Well, that was very uninformative.<p>&gt; To meet our performance expectations, Kafka must work from memory, and we don’t have much memory to give it... ...Even the smallest customer required two or three Kafka instances<p>A) What performance expectations? Timeliness?<p>B) What&#x27;s &quot;not much&quot; memory?<p>C) And why don&#x27;t you have that much?<p>D) When you say instance, do you mean &quot;broker&quot;, or actual clusters? Why did the smallest customer need 2 to 3 of them?
评论 #31765509 未加载
scottcodiealmost 3 years ago
I get not wanting to add yet-another-system to reduce operational complexity but it seems more economical to use a system like Flink to do a time windowed join and emit single records to be written to a persistence store. The Flink time window can be sufficiently large to encompass the disparity between ingest and event time without much RAM consumption by using a RocksDB state backend on the operator. Let me know if I miss something, every use case is different :)
评论 #31762083 未加载
评论 #31761818 未加载
fizwhizalmost 3 years ago
From earlier in the article:<p>&gt; Clock skew across different sensors: Sensors might be located across different datacenters, computers, and networks, so their clocks might not be synchronized to the millisecond.<p>And later on in their final solution<p>&gt; Implementation4 cons: Producers and consumers must have synchronized clocks (up to a certain resolution)<p>How do they reconcile this skew in their final solution?
评论 #31762140 未加载
throwawaythekeyalmost 3 years ago
Does any one on here have some real world experience with scylla?<p>We currently make heavy use of dynamo and are interested in something cheaper&#x2F;faster. The marketing material is pretty compelling but I&#x27;m unsure of how hard scylla is to operate at scale.
评论 #31762339 未加载
评论 #31762453 未加载
评论 #31762290 未加载
bluelightning2kalmost 3 years ago
If a general system becomes good enough, you see it displace specialized systems. In this case the Kafka paradigm can be replaced because there is such a performant NoSql DB.<p>It&#x27;s kind of like how standalone cameras became less and less desirable as phone cameras got better. Standalone could do better quality - but this matters less once both options are really good. There is some &#x27;good enough&#x27; point where you hit vastly diminishing returns &amp; simplifying into just phones became worthwhile.<p>Databases (certainly Scylla) may be hitting a point where specializing, actively optimizing, etc. are less desirable than just reusing one good system.
supermattalmost 3 years ago
Im not seeing the &quot;stream processing&quot; piece here.<p>Looks like they went from polling an RDBMS to some triggered querying of scylla, and then on to polling scylla.<p>i.e. they went from polling an RDBMS to polling Scylla. They didnt replace kafka with anything so now their implementation isnt reactive.<p>This is effectively no different that implementing a message queue in a database, with all the negatives that brings.<p>They are sharding for each consumer to prevent multiple consumption due to lack of locks. What if a consumer goes down? How does it manage its own state? All things managed by kafka (or pretty much any MQ) out of the box, and now they have to implement ALL of that themselves - none of which is mentioned in the article.
cvccvroomvroomalmost 3 years ago
This reads like sales copy. It&#x27;s freemium FOSS-washed crippleware with a radioactive license (AGPL). Hard pass.<p>I&#x27;ll stick to FOSS solutions that don&#x27;t require licenses to unlock closed-source components and can be patched by a community and&#x2F;or yourself.<p>Edit: Previously, there are other commercialish OSS NoSQL solutions for large-scale apps that are less proprietary with better licenses like Couchbase (not Cassandra CQL).
评论 #31763443 未加载
评论 #31765303 未加载
评论 #31760992 未加载
throwaway81523almost 3 years ago
ScyllaDB and its related parts like Seastar always struck me as real performance-oriented programming, though it was based on leveraging language tech (C++14 early on) that was painful. I wonder if a nicer approach is possible nowadays.
DeathArrowalmost 3 years ago
Is there any more recent technical review of ScyllaDB than this?<p><a href="https:&#x2F;&#x2F;jepsen.io&#x2F;analyses&#x2F;scylla-4.2-rc3" rel="nofollow">https:&#x2F;&#x2F;jepsen.io&#x2F;analyses&#x2F;scylla-4.2-rc3</a>
infogulchalmost 3 years ago
So they&#x27;re using Scylla to manually do what kafka does, basically processors polling for new records in a shard and updating their watermark once its done processing. I&#x27;m surprised that this is faster than just using kafka alone, though one of the reasons why they wanted to avoid kafka is dealing with deployment complexity and memory usage of kafka clusters.
DeathArrowalmost 3 years ago
&gt;Like the first solution, normalized data is stored in a database – but in this implementation, it’s a NoSQL database instead of a relational database.<p>What means data normalization in a NoSQL context? I think most normal forms make sense in a context where we have tables, rows and Relational algebra.
redwoodalmost 3 years ago
Interesting to see space for multiple commercial backers of Cassandra<p>Anyone seeing Cassandra adoption for new use cases in the public cloud?
评论 #31760416 未加载
Thev00d00almost 3 years ago
This is clearly just an advertisement, and not even an informative one!