TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Riko – A Python stream processing engine modeled after Yahoo! Pipes

283 pointsby reubanoalmost 9 years ago

15 comments

reubanoalmost 9 years ago
`riko` is pure python stream processing library for analyzing and processing streams of structured data. It&#x27;s modeled after Yahoo! Pipes [1] and was originally a fork of pipe2py [2]. It has both synchronous and asynchronous (via twisted) APIs, and supports parallel execution (via multiprocessing).<p>Out of the box, `riko` can read csv&#x2F;xml&#x2F;json&#x2F;html files; create text and data based flows via modular pipes; parse and extract RSS&#x2F;ATOM feeds; and bunch of other neat things. You can think of `riko` as a poor man&#x27;s Spark&#x2F;Storm... stream processing made easy!<p>Feedback welcome so let me know what you think!<p>Resources: FAQ [3], cookbook [4], and ipython notebook [5]<p>Quickie Demo:<p><pre><code> &gt;&gt;&gt; from riko.modules import fetch &gt;&gt;&gt; &gt;&gt;&gt; stream = fetch.pipe(conf={&#x27;url&#x27;: &#x27;https:&#x2F;&#x2F;news.ycombinator.com&#x2F;rss&#x27;}) &gt;&gt;&gt; item = next(stream) &gt;&gt;&gt; item[&#x27;title&#x27;], item[&#x27;link&#x27;] (&#x27;Master Plan, Part Deux&#x27;, &#x27;https:&#x2F;&#x2F;www.tesla.com&#x2F;blog&#x2F;master-plan-part-deux&#x27;) </code></pre> [1] <a href="https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20150930021241&#x2F;http:&#x2F;&#x2F;pipes.yahoo.com&#x2F;pipes&#x2F;" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20150930021241&#x2F;http:&#x2F;&#x2F;pipes.yaho...</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;ggaughan&#x2F;pipe2py&#x2F;" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ggaughan&#x2F;pipe2py&#x2F;</a><p>[3] <a href="https:&#x2F;&#x2F;github.com&#x2F;nerevu&#x2F;riko&#x2F;blob&#x2F;master&#x2F;docs&#x2F;FAQ.rst" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;nerevu&#x2F;riko&#x2F;blob&#x2F;master&#x2F;docs&#x2F;FAQ.rst</a><p>[4] <a href="https:&#x2F;&#x2F;github.com&#x2F;nerevu&#x2F;riko&#x2F;blob&#x2F;master&#x2F;docs&#x2F;COOKBOOK.rst" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;nerevu&#x2F;riko&#x2F;blob&#x2F;master&#x2F;docs&#x2F;COOKBOOK.rst</a><p>[5] <a href="http:&#x2F;&#x2F;nbviewer.jupyter.org&#x2F;github&#x2F;nerevu&#x2F;riko&#x2F;blob&#x2F;master&#x2F;examples&#x2F;usage.ipynb" rel="nofollow">http:&#x2F;&#x2F;nbviewer.jupyter.org&#x2F;github&#x2F;nerevu&#x2F;riko&#x2F;blob&#x2F;master&#x2F;e...</a>
评论 #12137618 未加载
评论 #12138682 未加载
Fuzzwahalmost 9 years ago
I was a heavy user of pipes and I&#x27;m now a heavy user of python. I have built my own dodgy simple replacement for some of the things I used to rely on pipes for. I&#x27;m very eager to see what you&#x27;ve got here, at first glance it seems like an excellent fit for my needs.<p>Thanks!
评论 #12136126 未加载
tanlerminalmost 9 years ago
Can you consider Dask integration? <a href="http:&#x2F;&#x2F;distributed.readthedocs.io&#x2F;en&#x2F;latest&#x2F;queues.html" rel="nofollow">http:&#x2F;&#x2F;distributed.readthedocs.io&#x2F;en&#x2F;latest&#x2F;queues.html</a> <a href="https:&#x2F;&#x2F;github.com&#x2F;dask&#x2F;dask" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;dask&#x2F;dask</a><p>It can handle parallel and distributed parts for you.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;dask&#x2F;dask" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;dask&#x2F;dask</a>
评论 #12137591 未加载
oellegaardalmost 9 years ago
If you&#x27;re looking for a stream processing engine more close to Storm, etc. but also simple, check out Motorway: <a href="https:&#x2F;&#x2F;github.com&#x2F;plecto&#x2F;motorway" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;plecto&#x2F;motorway</a> :-)
评论 #12137163 未加载
raimuealmost 9 years ago
I am still a user of Plagger [1], but development halted quite some time ago. Maybe this could be a good replacement.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;miyagawa&#x2F;plagger" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;miyagawa&#x2F;plagger</a>
评论 #12136487 未加载
ecesenaalmost 9 years ago
This is really interesting. Have you looked at Apache Beam? What I think is interesting about Beam -in this specific context- is that it has a standalone runner (java), that similarly as riko let you write pipelines without worrying about a complex setup. But then, if you need to scale your computation, Beam is runner-independent and you can take the same code and run it at scale on a cluster, wether it&#x27;s spark, flink, or google cloud. You can read more here [1].<p>As for riko more specifically, Beam will have soon a python sdk, but I&#x27;m unsure if there will be a python standalone runner. Maybe this is something to look into...<p>[1] <a href="https:&#x2F;&#x2F;www.oreilly.com&#x2F;ideas&#x2F;future-proof-and-scale-proof-your-code" rel="nofollow">https:&#x2F;&#x2F;www.oreilly.com&#x2F;ideas&#x2F;future-proof-and-scale-proof-y...</a>
评论 #12147201 未加载
tudorwalmost 9 years ago
if someone can spin up a usable gui, charge enough to make a living without compromising on performance, promise some longevity and a way to export of my stuff I would probably pay for that, I loved pipes, the GUI was a big deal for me.
评论 #12137052 未加载
评论 #12139507 未加载
评论 #12137808 未加载
ewindischalmost 9 years ago
Sweet. I put together something similar for NodeJS which is now called &#x27;turtle&#x27; (because it&#x27;s turtles all the way down...). There&#x27;s a bit of a focus on AWS Lambda &amp; other FaaS solutions as a means of building Lambda architectures, but it can be used by itself.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;iopipe&#x2F;turtle" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;iopipe&#x2F;turtle</a>
评论 #12140039 未加载
et2oalmost 9 years ago
Looks interesting. What kind of applications do people use this for?
评论 #12137946 未加载
mxuribealmost 9 years ago
While I didn&#x27;t use yahoo pipes too often, I loved it. Having this as a python library (I&#x27;m trying to get deeper into python), is great! Kudos and good luck!
svieiraalmost 9 years ago
Also in this space (and worth looking at for inspiration, especially for other potential sources and sinks of data) - Apache Camel [1].<p>[1]: <a href="http:&#x2F;&#x2F;camel.apache.org&#x2F;" rel="nofollow">http:&#x2F;&#x2F;camel.apache.org&#x2F;</a>
评论 #12138430 未加载
aioprisanalmost 9 years ago
Is there anything like this available that&#x27;s based on node.js with a decent GUI?
评论 #12139058 未加载
评论 #12139714 未加载
pastakingalmost 9 years ago
Also might want to check out <a href="http:&#x2F;&#x2F;concord.io" rel="nofollow">http:&#x2F;&#x2F;concord.io</a>, it&#x27;s a bit more work to set up, but it&#x27;s much faster than most stream processing systems
评论 #12138952 未加载
DyslexicAtheistalmost 9 years ago
This is absolutely beautiful. Love the fact that it&#x27;s using RSS for this.
评论 #12137063 未加载
sataialmost 9 years ago
Looks nice. Are there any plans for twitter support?
评论 #12136434 未加载