A free (as in speech and beer) alternative to Periscopre.io would be <a href="http://redash.io" rel="nofollow">http://redash.io</a>.
You’ll have to host it yourself, but it supports more than just SQL, and is centered around the idea of sharing queries, not reports, which helps shifts the discussion from “how did you get those numbers” to “what do those numbers mean”.
Airflow by airbnb is a nice alternative to luigi. We've been using it for our ETL and it's been working greatly so far. <a href="http://airbnb.io/projects/airflow/" rel="nofollow">http://airbnb.io/projects/airflow/</a>
One of my main takeaways from this article is that there is no such thing as easy analytics. In particular, you have to put though and energy into designing your analytics data stores and your ETL process.<p>It's a bit of a relief to see this after wondering why I've been unable to find an analytics solution that felt complete.
For anyone setting up a similar system, our product <a href="https://fivetran.com" rel="nofollow">https://fivetran.com</a> automates the most annoying part of this: getting the data into Redshift! We support MySQL, Postgres, Salesforce, and lots of other data sources.
Great post.<p>I didn't have this to read several months ago so I did end up writing my own ETL solution. It has been fun though and now we're into the analytics phase and using Chart.io - did you compare that to Periscope? It's a tad pricey but the charts are pretty easy to generate using their query builder if you don't want to get dirty with SQL (or have team members who don't have SQL skills yet).<p>I do think that rolling your own ETL can be rewarding though - especially if you are wrapping each attempt at an ETL process in a class and storing long term data about the monthly/daily/hourly/irregular processes for internal analysis, forecasting, bug-reporting, and providing fodder for visuals to sell the rest of the team on what you're doing.
Little bit more about luigi would be handy .. I believe same can be achieved using Pentaho Kettle as that has a complete workflow structure and its Opensource as well .. however any more examples or use cases of luigi?
This seems to echo something I said in another thread; the hackers are further away from their users:<p><i>Even for technical people, having access to analytics is helpful. Being informed about product metrics helps developers see the ‘why’ behind their work — a key ingredient to high performance</i><p>You mean that the technical people who actually implement the software are so alienated from their users that they have to wait for a royal decree to spur them to work?<p>I wonder how few of their ideas flow back up to the top.<p>I like the article; it's nice to see one use case of ETL. Too often people directly hit the production database to get reports instead of ETLing into a read-only db that won't affect site performance.
Is there a good guide to getting started using logfiles as the basis for analytics? Best practices on using client vs server-side signals (and if both, not double-counting), etc. I'm interested in learning more about it, but a google search doesn't turn up anything relevant. Snowplow seems to be using a setup like this, but not sure how they'd fit in with what the author was discussing.<p>I'm using Django, but I imagine this kind of system would be mostly platform-agnostic.
How much of this could be accomplished using something like Mixpanel? Honestly curious.<p>I know a true ETL solution like this will be more powerful and flexible, but how do you decide when it's worth the investment? Are there key use-cases or reports that only this type of solution can provide? Is the main issue with external analytics (like Mixpanel) that you need to know what data to collect before you can report on it?
Excellent writeup.
I love Splunk for many reasons (built a custom security App for Splunk as a pretty successful fraud detection tool at an enterprise financial firm).<p>Did you ever looked into Splunk DB connect app?<p>It allows to use structured DB (such as MySQL and others) as a data source for Splunk. I haven't used it yet but interested in feedback.
Can Airflow be used to load data from a file in filesystem( not in Hive ) to MySql .. also instead of calling a Bashscript can we call a PHP script.. Any pointers will be helpful.
Nice post! We went through a very similar process back in Zalora (<a href="http://www.zalora.com" rel="nofollow">http://www.zalora.com</a>) building our own Data Warehouse and Analytics function for over 300 internal users at the time.<p>Back when Redshift was on its initial beta release (2012), there almost wasn't any ETL / charts tool available for us, so we ended up building most of the tools and libraries ourselves. A few of them were open source too:<p>- <a href="https://github.com/zalora/redsift" rel="nofollow">https://github.com/zalora/redsift</a> A web-interface SQL tool for Redshift, letting the user to query and also export them to S3 (and send an email alert to user once done.)<p>- <a href="https://github.com/zalora/postgresql-user-manager" rel="nofollow">https://github.com/zalora/postgresql-user-manager</a> for managing user privileges<p>- <a href="https://github.com/lenguyenthedat/aws-redshift-to-rds" rel="nofollow">https://github.com/lenguyenthedat/aws-redshift-to-rds</a> for copying Tables from Redshift to RDS (postgresql)<p>- <a href="https://github.com/zalora/kraken" rel="nofollow">https://github.com/zalora/kraken</a> a bit similar to luigi or airflow.<p>Things has changed quite a lot since then, there are a lot of great solutions to our problems that are either free or very cheap and production-ready:<p>- redash.io (<a href="https://github.com/EverythingMe/redash" rel="nofollow">https://github.com/EverythingMe/redash</a>) - web-interface SQL tool with visualization - FREE<p>- redshift_console (<a href="https://github.com/EverythingMe/redshift_console" rel="nofollow">https://github.com/EverythingMe/redshift_console</a>) - redshift ops tool - FREE<p>- flydata (<a href="https://www.flydata.com/resources/flydata-sync/sync-rds-mysql-to-redshift/" rel="nofollow">https://www.flydata.com/resources/flydata-sync/sync-rds-mysq...</a>) sync live data from MySQL to Redshift - Subscription base<p>- dreamfactory (<a href="https://aws.amazon.com/marketplace/pp/B00GXYDK18?sr=0-3&qid=1385831342361" rel="nofollow">https://aws.amazon.com/marketplace/pp/B00GXYDK18?sr=0-3&qid=...</a>) for providing REST API interface to your database (supports redshift and other databases) - FREE<p>We didn't use Tableau in Zalora either (due to pricing and the number of users that we have in-house), and ended up building our own customized data dashboards with d3js and a few other different frameworks.<p>However, as long as you are ok with the price, Tableau is pretty good. It's being use widely in my current company (<a href="http://commercialize.tv" rel="nofollow">http://commercialize.tv</a>) :<p>- You can minimize processing from Tableau server by just create another data mart layer from Redshift with your ETL tools / scripts, having Tableau connecting directly to it.<p>- Visualizations / Charts creation process is pretty much straightforward. The end-result will look exceptionally comparing to other solutions that we have tried.<p>- They also have a really good and active community.
For anyone looking for a visualisation solution reasonably priced were you don't need to write SQL <a href="http://www.viurdata.com" rel="nofollow">http://www.viurdata.com</a>
We will be launching support for Amazon Redshift very very soon.
Nice post.<p>Looker (<a href="http://looker.com" rel="nofollow">http://looker.com</a>) is an alternative to Periscope that a ton of venture-backed tech companies are using (mostly in conjunction with Redshift). Here are two posts from Buffer and SeatGeek on their stacks using Redshift, Luigi and Looker. Buffer: <a href="https://overflow.bufferapp.com/2014/10/31/buffers-new-data-architecture/" rel="nofollow">https://overflow.bufferapp.com/2014/10/31/buffers-new-data-a...</a>. SeatGeek: <a href="http://chairnerd.seatgeek.com/building-out-the-seatgeek-data-pipeline/" rel="nofollow">http://chairnerd.seatgeek.com/building-out-the-seatgeek-data...</a><p>Looker does not necessitate each member of the team to know SQL to explore the data or create reports. It also has a text-based modeling language that is a thin abstraction layer of SQL. It makes SQL modular and reusable making it far more efficient to support a wide range of analysis. It's more expensive than Periscope, but it's way more powerful.