科技回声

7 条评论

gleb将近 10 年前

The example doesn't quite make sense:<pre><code> SELECT user_id IN (SELECT DISTINCT user_id FROM user_actions); </code></pre> is not valid SQL. You may mean something like:<pre><code> SELECT 123 IN (SELECT DISTINCT user_id FROM user_actions); </code></pre> which is a strange query, as it's equivalent to:<pre><code> SELECT 123 IN (SELECT user_id FROM user_actions);</code></pre>

评论 #9977533 未加载

评论 #9981150 未加载

mamikonyana将近 10 年前

When you say adding more online algorithms, is that mostly going to be limited to adding more probabilistic data structures?

评论 #9977012 未加载

pbnjay将近 10 年前

Don't get me wrong, I love Postgres and use it in pretty much all of my projects... but for this kind of application it's not very well suited. Leave your relational data for the database and use something more efficient!Redis comes with both bitfields (see <a href="http://redis.io/commands/bitcount" rel="nofollow">http://redis.io/commands/bitcount</a>) and hyperloglog counters (see <a href="http://redis.io/commands/pfcount" rel="nofollow">http://redis.io/commands/pfcount</a>), requires almost no setup and has very minimal overhead.

评论 #9977150 未加载

评论 #9977631 未加载

matsur将近 10 年前

Semi-related in the land of Postgres and probabilistic data structures -- Redshift supports APPROXIMATE COUNT. Much, much faster than a raw COUNT, and their stated error is +-2%<a href="http://docs.aws.amazon.com/redshift/latest/dg/r_COUNT.html" rel="nofollow">http://docs.aws.amazon.com/redshift/latest/dg/r_COUNT.html</a>

评论 #9977885 未加载

jordibunster将近 10 年前

Internally using hashtext(), which is not a good idea for a bloom filter for a few reasons, one of which is <a href="http://www.postgresql.org/message-id/CABUevExTx2whgSpKaoMVowDxBe=pm7w4LJkb=-k8NTohQT12Kg@mail.gmail.com" rel="nofollow">http://www.postgresql.org/message-id/CABUevExTx2whgSpKaoMVow...</a>

评论 #9978450 未加载

zallarak将近 10 年前

The idea of using probabilistic data structures instead of counting every point of data (for things like customer analytics) is pretty significant -- getting caught in the weeds of managing every data point is error-prone and inefficient.

ahachete将近 10 年前

usman-m, the approach of PipelineDB seems really interesting. However, I'd like to understand how in your opinion it compares with processing the stream of data changes accessed over PostgreSQL's logical decoding (<a href="http://www.postgresql.org/docs/9.4/static/logicaldecoding.html" rel="nofollow">http://www.postgresql.org/docs/9.4/static/logicaldecoding.ht...</a>) interface. Thank you

评论 #9977513 未加载

7 条评论

gleb将近 10 年前

评论 #9977533 未加载

评论 #9981150 未加载

mamikonyana将近 10 年前

When you say adding more online algorithms, is that mostly going to be limited to adding more probabilistic data structures?

Making Postgres Bloom

7 条评论

Making Postgres Bloom

7 条评论