TechEcho

15 comments

reubanoalmost 9 years ago

`riko` is pure python stream processing library for analyzing and processing streams of structured data. It's modeled after Yahoo! Pipes [1] and was originally a fork of pipe2py [2]. It has both synchronous and asynchronous (via twisted) APIs, and supports parallel execution (via multiprocessing).Out of the box, `riko` can read csv/xml/json/html files; create text and data based flows via modular pipes; parse and extract RSS/ATOM feeds; and bunch of other neat things. You can think of `riko` as a poor man's Spark/Storm... stream processing made easy!Feedback welcome so let me know what you think!Resources: FAQ [3], cookbook [4], and ipython notebook [5]Quickie Demo:<pre><code> >>> from riko.modules import fetch >>> >>> stream = fetch.pipe(conf={'url': 'https://news.ycombinator.com/rss'}) >>> item = next(stream) >>> item['title'], item['link'] ('Master Plan, Part Deux', 'https://www.tesla.com/blog/master-plan-part-deux') </code></pre> [1] <a href="https://web.archive.org/web/20150930021241/http://pipes.yahoo.com/pipes/" rel="nofollow">https://web.archive.org/web/20150930021241/http://pipes.yaho...</a>[2] <a href="https://github.com/ggaughan/pipe2py/" rel="nofollow">https://github.com/ggaughan/pipe2py/</a>[3] <a href="https://github.com/nerevu/riko/blob/master/docs/FAQ.rst" rel="nofollow">https://github.com/nerevu/riko/blob/master/docs/FAQ.rst</a>[4] <a href="https://github.com/nerevu/riko/blob/master/docs/COOKBOOK.rst" rel="nofollow">https://github.com/nerevu/riko/blob/master/docs/COOKBOOK.rst</a>[5] <a href="http://nbviewer.jupyter.org/github/nerevu/riko/blob/master/examples/usage.ipynb" rel="nofollow">http://nbviewer.jupyter.org/github/nerevu/riko/blob/master/e...</a>

评论 #12137618 未加载

评论 #12138682 未加载

Fuzzwahalmost 9 years ago

I was a heavy user of pipes and I'm now a heavy user of python. I have built my own dodgy simple replacement for some of the things I used to rely on pipes for. I'm very eager to see what you've got here, at first glance it seems like an excellent fit for my needs.Thanks!

评论 #12136126 未加载

tanlerminalmost 9 years ago

Can you consider Dask integration? <a href="http://distributed.readthedocs.io/en/latest/queues.html" rel="nofollow">http://distributed.readthedocs.io/en/latest/queues.html</a> <a href="https://github.com/dask/dask" rel="nofollow">https://github.com/dask/dask</a>It can handle parallel and distributed parts for you.<a href="https://github.com/dask/dask" rel="nofollow">https://github.com/dask/dask</a>

评论 #12137591 未加载

oellegaardalmost 9 years ago

If you're looking for a stream processing engine more close to Storm, etc. but also simple, check out Motorway: <a href="https://github.com/plecto/motorway" rel="nofollow">https://github.com/plecto/motorway</a> :-)

评论 #12137163 未加载

raimuealmost 9 years ago

I am still a user of Plagger [1], but development halted quite some time ago. Maybe this could be a good replacement.[1] <a href="https://github.com/miyagawa/plagger" rel="nofollow">https://github.com/miyagawa/plagger</a>

评论 #12136487 未加载

ecesenaalmost 9 years ago

This is really interesting. Have you looked at Apache Beam? What I think is interesting about Beam -in this specific context- is that it has a standalone runner (java), that similarly as riko let you write pipelines without worrying about a complex setup. But then, if you need to scale your computation, Beam is runner-independent and you can take the same code and run it at scale on a cluster, wether it's spark, flink, or google cloud. You can read more here [1].As for riko more specifically, Beam will have soon a python sdk, but I'm unsure if there will be a python standalone runner. Maybe this is something to look into...[1] <a href="https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code" rel="nofollow">https://www.oreilly.com/ideas/future-proof-and-scale-proof-y...</a>

评论 #12147201 未加载

tudorwalmost 9 years ago

if someone can spin up a usable gui, charge enough to make a living without compromising on performance, promise some longevity and a way to export of my stuff I would probably pay for that, I loved pipes, the GUI was a big deal for me.

评论 #12137052 未加载

评论 #12139507 未加载

评论 #12137808 未加载

ewindischalmost 9 years ago

Sweet. I put together something similar for NodeJS which is now called 'turtle' (because it's turtles all the way down...). There's a bit of a focus on AWS Lambda & other FaaS solutions as a means of building Lambda architectures, but it can be used by itself.<a href="https://github.com/iopipe/turtle" rel="nofollow">https://github.com/iopipe/turtle</a>

评论 #12140039 未加载

et2oalmost 9 years ago

Looks interesting. What kind of applications do people use this for?

评论 #12137946 未加载

mxuribealmost 9 years ago

While I didn't use yahoo pipes too often, I loved it. Having this as a python library (I'm trying to get deeper into python), is great! Kudos and good luck!

svieiraalmost 9 years ago

Also in this space (and worth looking at for inspiration, especially for other potential sources and sinks of data) - Apache Camel [1].[1]: <a href="http://camel.apache.org/" rel="nofollow">http://camel.apache.org/</a>

评论 #12138430 未加载

aioprisanalmost 9 years ago

Is there anything like this available that's based on node.js with a decent GUI?

评论 #12139058 未加载

评论 #12139714 未加载

pastakingalmost 9 years ago

Also might want to check out <a href="http://concord.io" rel="nofollow">http://concord.io</a>, it's a bit more work to set up, but it's much faster than most stream processing systems

评论 #12138952 未加载

DyslexicAtheistalmost 9 years ago

This is absolutely beautiful. Love the fact that it's using RSS for this.

评论 #12137063 未加载

sataialmost 9 years ago

Looks nice. Are there any plans for twitter support?

评论 #12136434 未加载

15 comments

reubanoalmost 9 years ago

评论 #12137618 未加载

评论 #12138682 未加载

Fuzzwahalmost 9 years ago

评论 #12136126 未加载

tanlerminalmost 9 years ago

评论 #12137591 未加载

oellegaardalmost 9 years ago

评论 #12137163 未加载

raimuealmost 9 years ago

评论 #12136487 未加载

ecesenaalmost 9 years ago

评论 #12147201 未加载

tudorwalmost 9 years ago

评论 #12137052 未加载

评论 #12139507 未加载

评论 #12137808 未加载

ewindischalmost 9 years ago

评论 #12140039 未加载

et2oalmost 9 years ago

Looks interesting. What kind of applications do people use this for?

评论 #12137946 未加载

mxuribealmost 9 years ago

While I didn't use yahoo pipes too often, I loved it. Having this as a python library (I'm trying to get deeper into python), is great! Kudos and good luck!

svieiraalmost 9 years ago

评论 #12138430 未加载

aioprisanalmost 9 years ago

Is there anything like this available that's based on node.js with a decent GUI?

评论 #12139058 未加载

评论 #12139714 未加载

pastakingalmost 9 years ago

Also might want to check out <a href="http://concord.io" rel="nofollow">http://concord.io</a>, it's a bit more work to set up, but it's much faster than most stream processing systems

Show HN: Riko – A Python stream processing engine modeled after Yahoo! Pipes

15 comments

Show HN: Riko – A Python stream processing engine modeled after Yahoo! Pipes

15 comments