TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A Newbie’s Guide to Cassandra

119 点作者 ddrum001超过 7 年前

5 条评论

nemothekid超过 7 年前
One thing that Cassandra doesn&#x27;t have a good story of, and what intro guides continue to gloss over is the ops situation. I&#x27;ve recently moved some our largest Cassandra tables to BigTable for this reason. The compaction &#x2F; repair &#x2F; garbage collection death cycle is probably the most difficult thing to manage, and in the past 3 years of using Cassandra, managing it has gotten worse. Tools have been deprecated (like OpsCenter) and new features can exacerbate the problem. There is still no reliable way to detect when repairs have finished, and if you have a large enough table, repairs can take a week to finish. Combine that with the fact that if a table is that large, then it probably has a high write volume - meaning it has a lot of compactions as well. So you have repairs and compactions going on which thrash the heap, and now you also have a GC tuning problem.<p>It took a lot of experimentation to get right, but once I did, scaling started to mean smaller drives and more nodes, which meant a more expensive cluster, for which I was largely paying for my CPUs to repair and garbage collect data.<p>Other than ops however, Cassandra is a great tool and does everything it says it does on the box.
评论 #15188953 未加载
schmichael超过 7 年前
This article does a massive disservice by using the pre-CQL Column Family and Row terminology. While it&#x27;s the Cassandra data modelling I&#x27;m the most accustomed to personally, it causes endless confusion for users who find themselves in the CQL documentation trying to understand how it all maps to Primary Keys, Partition Keys, static columns, etc.<p>This transition has been causing confusion for at least 5 years now, and it appears people are still using the old terminology! <a href="https:&#x2F;&#x2F;www.datastax.com&#x2F;dev&#x2F;blog&#x2F;thrift-to-cql3" rel="nofollow">https:&#x2F;&#x2F;www.datastax.com&#x2F;dev&#x2F;blog&#x2F;thrift-to-cql3</a>
评论 #15188960 未加载
mi100hael超过 7 年前
<i>&gt; Cassandra’s data model is a partitioned row store with tunable consistency where each row is an instance of a column family that follows the same schema</i><p>&quot;Total Newbie&quot; apparently means well-versed in database paradigms and terminology.
评论 #15189069 未加载
评论 #15186855 未加载
errantmind超过 7 年前
These days I work with Cassandra on a daily basis. The company I am contracting with switched to Cassandra a while back for their primary data store. A few poor decisions later and they were spending tens of thousands of dollars a month running Cassandra in Azure. The cost was high because they modeled and queried their data like they were still using a SQL database which was incredibly inefficient.<p>The lesson here is to think long and hard about how you are going to access your data before switching to a database like Cassandra. This will help you decide if Cassandra is the right database to fit your use-cases. If so, be sure to model your data appropriately.<p>In this case, based on how the company wants to query the data, they would have been better off with PostgeSQL.
评论 #15186572 未加载
评论 #15186839 未加载
bkeroack超过 7 年前
CQL is the best and worst thing about Cassandra. The pro is that obviously it is very similar to SQL so it&#x27;s easy to understand, the con is that C* is nothing like a RDBMS so you can be easily fooled into doing dumb&#x2F;inefficient things with the nice CQL syntax.<p>I think that Cassandra is best thought of as a fancy K&#x2F;V store that lets extra data ride along with query results. Don&#x27;t think of rows&#x2F;columns at first, it will just screw you up in your modeling. Also keep in mind that the cost for very fast queries is a lot of extra time spent figuring out how to model new data access patterns in the future.
评论 #15188986 未加载
评论 #15189034 未加载