`riko` is pure python stream processing library for analyzing and processing streams of structured data. It's modeled after Yahoo! Pipes [1] and was originally a fork of pipe2py [2]. It has both synchronous and asynchronous (via twisted) APIs, and supports parallel execution (via multiprocessing).<p>Out of the box, `riko` can read csv/xml/json/html files; create text and data based flows via modular pipes; parse and extract RSS/ATOM feeds; and bunch of other neat things. You can think of `riko` as a poor man's Spark/Storm... stream processing made easy!<p>Feedback welcome so let me know what you think!<p>Resources: FAQ [3], cookbook [4], and ipython notebook [5]<p>Quickie Demo:<p><pre><code> >>> from riko.modules import fetch
>>>
>>> stream = fetch.pipe(conf={'url': 'https://news.ycombinator.com/rss'})
>>> item = next(stream)
>>> item['title'], item['link']
('Master Plan, Part Deux', 'https://www.tesla.com/blog/master-plan-part-deux')
</code></pre>
[1] <a href="https://web.archive.org/web/20150930021241/http://pipes.yahoo.com/pipes/" rel="nofollow">https://web.archive.org/web/20150930021241/http://pipes.yaho...</a><p>[2] <a href="https://github.com/ggaughan/pipe2py/" rel="nofollow">https://github.com/ggaughan/pipe2py/</a><p>[3] <a href="https://github.com/nerevu/riko/blob/master/docs/FAQ.rst" rel="nofollow">https://github.com/nerevu/riko/blob/master/docs/FAQ.rst</a><p>[4] <a href="https://github.com/nerevu/riko/blob/master/docs/COOKBOOK.rst" rel="nofollow">https://github.com/nerevu/riko/blob/master/docs/COOKBOOK.rst</a><p>[5] <a href="http://nbviewer.jupyter.org/github/nerevu/riko/blob/master/examples/usage.ipynb" rel="nofollow">http://nbviewer.jupyter.org/github/nerevu/riko/blob/master/e...</a>
I was a heavy user of pipes and I'm now a heavy user of python. I have built my own dodgy simple replacement for some of the things I used to rely on pipes for. I'm very eager to see what you've got here, at first glance it seems like an excellent fit for my needs.<p>Thanks!
Can you consider Dask integration?
<a href="http://distributed.readthedocs.io/en/latest/queues.html" rel="nofollow">http://distributed.readthedocs.io/en/latest/queues.html</a>
<a href="https://github.com/dask/dask" rel="nofollow">https://github.com/dask/dask</a><p>It can handle parallel and distributed parts for you.<p><a href="https://github.com/dask/dask" rel="nofollow">https://github.com/dask/dask</a>
If you're looking for a stream processing engine more close to Storm, etc. but also simple, check out Motorway: <a href="https://github.com/plecto/motorway" rel="nofollow">https://github.com/plecto/motorway</a> :-)
I am still a user of Plagger [1], but development halted quite some time ago. Maybe this could be a good replacement.<p>[1] <a href="https://github.com/miyagawa/plagger" rel="nofollow">https://github.com/miyagawa/plagger</a>
This is really interesting. Have you looked at Apache Beam? What I think is interesting about Beam -in this specific context- is that it has a standalone runner (java), that similarly as riko let you write pipelines without worrying about a complex setup. But then, if you need to scale your computation, Beam is runner-independent and you can take the same code and run it at scale on a cluster, wether it's spark, flink, or google cloud. You can read more here [1].<p>As for riko more specifically, Beam will have soon a python sdk, but I'm unsure if there will be a python standalone runner. Maybe this is something to look into...<p>[1] <a href="https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code" rel="nofollow">https://www.oreilly.com/ideas/future-proof-and-scale-proof-y...</a>
if someone can spin up a usable gui, charge enough to make a living without compromising on performance, promise some longevity and a way to export of my stuff I would probably pay for that, I loved pipes, the GUI was a big deal for me.
Sweet. I put together something similar for NodeJS which is now called 'turtle' (because it's turtles all the way down...). There's a bit of a focus on AWS Lambda & other FaaS solutions as a means of building Lambda architectures, but it can be used by itself.<p><a href="https://github.com/iopipe/turtle" rel="nofollow">https://github.com/iopipe/turtle</a>
While I didn't use yahoo pipes too often, I loved it. Having this as a python library (I'm trying to get deeper into python), is great! Kudos and good luck!
Also in this space (and worth looking at for inspiration, especially for other potential sources and sinks of data) - Apache Camel [1].<p>[1]: <a href="http://camel.apache.org/" rel="nofollow">http://camel.apache.org/</a>
Also might want to check out <a href="http://concord.io" rel="nofollow">http://concord.io</a>, it's a bit more work to set up, but it's much faster than most stream processing systems