TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

To Be Continuous

153 pointsby Fergialmost 10 years ago

19 comments

perplexesalmost 10 years ago
I&#x27;m surprised no one has mentioned Esper yet: <a href="http:&#x2F;&#x2F;www.espertech.com&#x2F;esper&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.espertech.com&#x2F;esper&#x2F;</a><p>Esper does exactly this - you run streams of events over it and it continuously executes SQL to see if it matches. If so you can:<p>- run code<p>- make new streams<p>- store the results<p>Esper&#x27;s been doing this kind of thing for 9 years now.
评论 #9849738 未加载
res0nat0ralmost 10 years ago
I downloaded the OSX .pkg installer and didn&#x27;t see anything in &#x2F;Applications or &#x2F;opt after running it and telling it to install to my root drive. Just glancing at some docs on your site I see pipeline-init, so doing a find on &#x2F; to find out where it placed the binaries see it installed to:<p>&#x2F;usr&#x2F;lib&#x2F;pipelinedb&#x2F;usr&#x2F;lib&#x2F;pipelinedb&#x2F;bin&#x2F;pipeline-init<p>Is this intentional?<p>EDIT:<p>After playing around with the .pkg file it looks like the packed Payload contains &#x27;&#x2F;usr&#x2F;bin&#x2F;pipelinedb&#x2F;usr&#x2F;lib&#x2F;pipelinedb&#x27; which is probably the problem. I see broken symlinks for pipeline-init etc in &#x2F;usr&#x2F;bin pointing to &#x2F;usr&#x2F;lib&#x2F;pipelinedb, so I&#x27;m guessing this repetition of the path above is a mistake.<p>Also I see a postinstall script creating a symlink from pipeline to psql. This seems like a bad idea as psql is pretty universal already as the name for the PostgreSQL CLI binary, maybe &#x27;pipesql&#x27; might be better?
评论 #9848908 未加载
tuckermialmost 10 years ago
How does the PipelineDB differ or build on the ideas from Aurora&#x2F;Borealis&#x2F;StreamBase? At least at a high level, something like LiveView[1] seems to provide similar functionality to PipelineDB&#x27;s concept of a Continuous View.<p>I was under the impression that the academic projects had proposed StreamSQL as a general language, though since StreamBase&#x27;s acquisition it now seems to have been branded as TIBCO StreamSQL[2]. Have you guys been part of any efforts to make sure that there is an open language standard?<p>[1] <a href="http:&#x2F;&#x2F;streambase.typepad.com&#x2F;streambase_stream_process&#x2F;2013&#x2F;05&#x2F;liveview-14-new-continuous-queries.html" rel="nofollow">http:&#x2F;&#x2F;streambase.typepad.com&#x2F;streambase_stream_process&#x2F;2013...</a><p>[2] <a href="http:&#x2F;&#x2F;www.streambase.com&#x2F;developers&#x2F;docs&#x2F;latest&#x2F;streamsql&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.streambase.com&#x2F;developers&#x2F;docs&#x2F;latest&#x2F;streamsql&#x2F;</a>
评论 #9848458 未加载
chadthendersonalmost 10 years ago
This looks very cool. Although, I&#x27;m not sure I totally understand how it can be used to replace batch ETL processes. So, PipelineDB eliminates ETL batch processing by incrementally inserting data into continuous views, but the documentation says that it&#x27;s not meant for ad-hoc data warehouses as the raw data is discarded. So, does that leave me still using batch processes to load my data warehouse? Is PipelineDB going to be my data warehouse as long as I only want the resulting streamed data? Just trying to figure out what this would look like and where its place is in a data warehouse environment.
评论 #9848880 未加载
WaxProlixalmost 10 years ago
As someone who&#x27;s made a lot of use of `tail` and similar, this is appealing.<p>But I don&#x27;t have a lot of use cases in personal projects, and am unlikely to find a good use-case at work in the near future. What&#x27;s the &#x27;adoption path&#x27; for something like this?<p>I think a really robust sample data set with example queries (think the neo4j imdb examples) would be a great way to show how powerful and easy something like this can be.
评论 #9847833 未加载
therealmockeralmost 10 years ago
How similar is this to something like <a href="http:&#x2F;&#x2F;riemann.io" rel="nofollow">http:&#x2F;&#x2F;riemann.io</a> for processing events from a stream?
评论 #9848777 未加载
zallarakalmost 10 years ago
Very cool that it is open sourced - seems like there would be a lot to learn from the code. Link: <a href="https:&#x2F;&#x2F;github.com&#x2F;pipelinedb&#x2F;pipelinedb" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;pipelinedb&#x2F;pipelinedb</a>
djupbluealmost 10 years ago
This is awesome, thanks for making it open source!<p>Would it be possible to set triggers or something on the continuous views? Lets say I want to take action (immediately) when a value calculated over sliding window goes above a limit.<p>It&#x27;s a bit late here but I&#x27;ll definitely play with PipelineDB tomorrow.
评论 #9848274 未加载
asgard1024almost 10 years ago
This claim about ETL not needed in the future sounds dubious. I work on a large application that is all about ETL. If we wanted to use this new method instead, I am not sure how it would deal with the following:<p>- State in the data. In many sources we have, processing depends on some internal state, which must be kept along the time. For example some process has started and we will know when it ended, and we must keep its state so we could correctly process the ending event (to match it up). I am not clear how this will work with continuous views. I would say this is actually the major reason of what makes ETL processing non-trivial.<p>- Processing failure. Let&#x27;s say something goes wrong and the data processing fails (or it can actually be even planned downtime). How do we know where to restart, to avoid processing data twice or miss data? Does the continuous stream take care of this metadata? And how does it deal with the state information per above? If you do data processing in batches, there is an obvious point of restart. Again, I think the extra complexity that &quot;continuous&quot; approach says is unnecessary relates to the fact that you want to be able to checkpoint the state of processing for various reasons.
burembaalmost 10 years ago
It seems PipelineDB doesn&#x27;t have a clustered version, all the data must be sent to one server similar to Postgresql. Considering the fact that stream processing feature is usually useful in big data (if the data size is not that big and the data can fit in memory, complex aggregation queries usually don&#x27;t take more than 1 second using a columnar database), is it possible to use PipelineDB for millions of events per second?
moatraalmost 10 years ago
Do Continuous Views work with table-table joins, or must there always be at least one stream present? The documentation[1] doesn&#x27;t specify.<p>If so, this could be an interesting alternative to RethinkDB&#x27;s changefeeds, as RethinkDB doesn&#x27;t support joins on the change stream.<p>[1] <a href="http:&#x2F;&#x2F;docs.pipelinedb.com&#x2F;joins.html" rel="nofollow">http:&#x2F;&#x2F;docs.pipelinedb.com&#x2F;joins.html</a>
评论 #9848212 未加载
评论 #9848157 未加载
评论 #9848376 未加载
mfenniakalmost 10 years ago
Cool. Very cool.<p>My first thought (aside from &quot;Cool&quot;) was that the current time would be the tricky thing that can&#x27;t be incorporated into a continuous view. But even that seems to be handled! <a href="http:&#x2F;&#x2F;docs.pipelinedb.com&#x2F;sliding-windows.html" rel="nofollow">http:&#x2F;&#x2F;docs.pipelinedb.com&#x2F;sliding-windows.html</a><p>Looks pretty impressive. :-)
burembaalmost 10 years ago
We needed to implement continuous queries in our application code. (It&#x27;s actually hard to do it right in Postgresql so it&#x27;s very limited) <a href="https:&#x2F;&#x2F;github.com&#x2F;buremba&#x2F;rakam&#x2F;wiki&#x2F;Postgresql-Backend#continuous-query-tables" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;buremba&#x2F;rakam&#x2F;wiki&#x2F;Postgresql-Backend#con...</a> Since stream processing and real-time analytics are quite hot topics nowadays, I think real-time databases will get much more attention in a near future.
评论 #9850724 未加载
jxramosalmost 10 years ago
Well said! Good timing too, I&#x27;m beginning to sketch out how to tackle this large file set processing that has to stitch together data from corresponding files. The magnitude I&#x27;m imagining is such that I can&#x27;t just read all the files into memory and do the matching, number crunching, and what not against. I like the concepts and terminology in this article. Definitely worth keeping in the back pocket going forward if not diving into it all outright. Thanks so much.
infodroidalmost 10 years ago
It looks like PipelineDB is implemented as a fork of PostgreSQL. I would be interested to understand what is different about the architecture of PipelineDB that it couldn&#x27;t be integrated into upstream PostgreSQL.
评论 #9848255 未加载
评论 #9848427 未加载
sherazalmost 10 years ago
Can PipelineDB be used to run projections for an EventStore?<p>I&#x27;m experimenting with the EventStore pattern for a side project, and I have struggled to implement projections. Could PipelineDB be a way to deliver that?
jpitzalmost 10 years ago
In the example for sf_proximity_count, you state the view covers a 5 minute sliding window, but the WHERE clause does not reference clock_timestamp(). Is 5 minutes an implicit default?
评论 #9854197 未加载
pradnalmost 10 years ago
What does ETL mean in this context?
评论 #9848091 未加载
maslamalmost 10 years ago
Wonderful! Great job.