TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Making Postgres Bloom

69 pointsby usman-malmost 10 years ago

7 comments

glebalmost 10 years ago
The example doesn&#x27;t quite make sense:<p><pre><code> SELECT user_id IN (SELECT DISTINCT user_id FROM user_actions); </code></pre> is not valid SQL. You may mean something like:<p><pre><code> SELECT 123 IN (SELECT DISTINCT user_id FROM user_actions); </code></pre> which is a strange query, as it&#x27;s equivalent to:<p><pre><code> SELECT 123 IN (SELECT user_id FROM user_actions);</code></pre>
评论 #9977533 未加载
评论 #9981150 未加载
mamikonyanaalmost 10 years ago
When you say adding more online algorithms, is that mostly going to be limited to adding more probabilistic data structures?
评论 #9977012 未加载
pbnjayalmost 10 years ago
Don&#x27;t get me wrong, I love Postgres and use it in pretty much all of my projects... but for this kind of application it&#x27;s not very well suited. Leave your relational data for the database and use something more efficient!<p>Redis comes with both bitfields (see <a href="http:&#x2F;&#x2F;redis.io&#x2F;commands&#x2F;bitcount" rel="nofollow">http:&#x2F;&#x2F;redis.io&#x2F;commands&#x2F;bitcount</a>) and hyperloglog counters (see <a href="http:&#x2F;&#x2F;redis.io&#x2F;commands&#x2F;pfcount" rel="nofollow">http:&#x2F;&#x2F;redis.io&#x2F;commands&#x2F;pfcount</a>), requires almost no setup and has very minimal overhead.
评论 #9977150 未加载
评论 #9977631 未加载
matsuralmost 10 years ago
Semi-related in the land of Postgres and probabilistic data structures -- Redshift supports APPROXIMATE COUNT. Much, much faster than a raw COUNT, and their stated error is +-2%<p><a href="http:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;redshift&#x2F;latest&#x2F;dg&#x2F;r_COUNT.html" rel="nofollow">http:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;redshift&#x2F;latest&#x2F;dg&#x2F;r_COUNT.html</a>
评论 #9977885 未加载
jordibunsteralmost 10 years ago
Internally using hashtext(), which is not a good idea for a bloom filter for a few reasons, one of which is <a href="http:&#x2F;&#x2F;www.postgresql.org&#x2F;message-id&#x2F;CABUevExTx2whgSpKaoMVowDxBe=pm7w4LJkb=-k8NTohQT12Kg@mail.gmail.com" rel="nofollow">http:&#x2F;&#x2F;www.postgresql.org&#x2F;message-id&#x2F;CABUevExTx2whgSpKaoMVow...</a>
评论 #9978450 未加载
zallarakalmost 10 years ago
The idea of using probabilistic data structures instead of counting every point of data (for things like customer analytics) is pretty significant -- getting caught in the weeds of managing every data point is error-prone and inefficient.
ahachetealmost 10 years ago
usman-m, the approach of PipelineDB seems really interesting. However, I&#x27;d like to understand how in your opinion it compares with processing the stream of data changes accessed over PostgreSQL&#x27;s logical decoding (<a href="http:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;9.4&#x2F;static&#x2F;logicaldecoding.html" rel="nofollow">http:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;9.4&#x2F;static&#x2F;logicaldecoding.ht...</a>) interface. Thank you
评论 #9977513 未加载