TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Making Postgres Bloom

69 点作者 usman-m将近 10 年前

7 条评论

gleb将近 10 年前
The example doesn&#x27;t quite make sense:<p><pre><code> SELECT user_id IN (SELECT DISTINCT user_id FROM user_actions); </code></pre> is not valid SQL. You may mean something like:<p><pre><code> SELECT 123 IN (SELECT DISTINCT user_id FROM user_actions); </code></pre> which is a strange query, as it&#x27;s equivalent to:<p><pre><code> SELECT 123 IN (SELECT user_id FROM user_actions);</code></pre>
评论 #9977533 未加载
评论 #9981150 未加载
mamikonyana将近 10 年前
When you say adding more online algorithms, is that mostly going to be limited to adding more probabilistic data structures?
评论 #9977012 未加载
pbnjay将近 10 年前
Don&#x27;t get me wrong, I love Postgres and use it in pretty much all of my projects... but for this kind of application it&#x27;s not very well suited. Leave your relational data for the database and use something more efficient!<p>Redis comes with both bitfields (see <a href="http:&#x2F;&#x2F;redis.io&#x2F;commands&#x2F;bitcount" rel="nofollow">http:&#x2F;&#x2F;redis.io&#x2F;commands&#x2F;bitcount</a>) and hyperloglog counters (see <a href="http:&#x2F;&#x2F;redis.io&#x2F;commands&#x2F;pfcount" rel="nofollow">http:&#x2F;&#x2F;redis.io&#x2F;commands&#x2F;pfcount</a>), requires almost no setup and has very minimal overhead.
评论 #9977150 未加载
评论 #9977631 未加载
matsur将近 10 年前
Semi-related in the land of Postgres and probabilistic data structures -- Redshift supports APPROXIMATE COUNT. Much, much faster than a raw COUNT, and their stated error is +-2%<p><a href="http:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;redshift&#x2F;latest&#x2F;dg&#x2F;r_COUNT.html" rel="nofollow">http:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;redshift&#x2F;latest&#x2F;dg&#x2F;r_COUNT.html</a>
评论 #9977885 未加载
jordibunster将近 10 年前
Internally using hashtext(), which is not a good idea for a bloom filter for a few reasons, one of which is <a href="http:&#x2F;&#x2F;www.postgresql.org&#x2F;message-id&#x2F;CABUevExTx2whgSpKaoMVowDxBe=pm7w4LJkb=-k8NTohQT12Kg@mail.gmail.com" rel="nofollow">http:&#x2F;&#x2F;www.postgresql.org&#x2F;message-id&#x2F;CABUevExTx2whgSpKaoMVow...</a>
评论 #9978450 未加载
zallarak将近 10 年前
The idea of using probabilistic data structures instead of counting every point of data (for things like customer analytics) is pretty significant -- getting caught in the weeds of managing every data point is error-prone and inefficient.
ahachete将近 10 年前
usman-m, the approach of PipelineDB seems really interesting. However, I&#x27;d like to understand how in your opinion it compares with processing the stream of data changes accessed over PostgreSQL&#x27;s logical decoding (<a href="http:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;9.4&#x2F;static&#x2F;logicaldecoding.html" rel="nofollow">http:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;9.4&#x2F;static&#x2F;logicaldecoding.ht...</a>) interface. Thank you
评论 #9977513 未加载