I would strongly recommend against anyone using Seaborn.<p>I used it for charts in a paper recently, since it includes swarm plots. I hit a problem when overlaying certain types of plot on the same axes (I think it was swarm plots on top of box plots, so I could show every data point as well as the quartiles). The problem was that the data would end up shifted, so the x axis labels weren't correct, even though each plot on its own would work fine when the other was commented-out. The only reason I noticed this was because I spotted that a peak I knew occurred at x=13 was showing up at x=14.<p>Although it took a specific set of circumstances to trigger, I thought this was still a serious problem since it causes data to be misrepresented, and it doesn't cause any warning, etc. that something is wrong. I made a minimal example script, opened a github issue ( <a href="https://github.com/mwaskom/seaborn/issues/1409" rel="nofollow">https://github.com/mwaskom/seaborn/issues/1409</a> ), where the library author insulted me and locked the issue. I sent them a followup email, to explain that I was not after "free tech support" (I'd already worked around the issue for my paper by 'faking' the labels), I wanted to help improve the library so that others would avoid having incorrect plots (especially those not lucky enough to spot it like I was) and how I'd already spent considerable time narrowing down the problem to the minimal example, as evidence that I wasn't trying to be a freeloader. I also began my email with an apology for using this side channel, and that I wouldn't contact them again unless they consented to it. The author replied with more insults.<p>I didn't contact them again, as promised, but now I'm actively opposed to anyone using this project, due to the author's complete disregard for corruption of scientific data.
I find Pandas+Bokeh to be a pretty killer combination. You get nice vectorized operations for manipulating the data upfront, and a pretty wide range of visualizations that are plenty customizable. Main beefs with bokeh are:<p>- It's great that I can manipulate a view by zooming, panning, turning a series off, whatever, but all that state is lost when I send the URL of the plot to someone else (eg, for the case where it was generated by a CI job or something rather than locally on my machine). It should be easier in Bokeh to grab all that state and stuff it into a querystring that can then rehydrate the same view later on.<p>- Doing plots with more than one y-axis is a lot more awkward and fiddly than it should be, and even once you succeed, the axes can't be panned independently.<p>- If your data series name overlaps with the label in your legend, the whole plot breaks in an extremely non-obvious way (basically, Bokeh thinks that you want a legend entry <i>per row in the named data series</i>). Even being fully prepared for it, I lose time to this every once in a while, and it's almost a rite of passage that every junior dev hits it and burns an hour or two trying to figure out what is wrong.
Python visualization needs a visionary like Hadley Wickham. Once you get used to the clarity of "Grammar of Graphics", every other plotting library feels like a clumsy tool.<p>Also, people pointing out the diversity of needs behind the multiverse of plotting libraries in Python probably aren't aware of the large number of extensions built on top of ggplot2[1]. In python, they end up being completely new packages.<p>1. <a href="http://www.ggplot2-exts.org/gallery/" rel="nofollow">http://www.ggplot2-exts.org/gallery/</a>
Altair hands down. I've tried and worked with a lot of those in data analysis, Altair just seems like 'magic'.<p>What should be easy is easy, what is hard is still possible - for me it's the perfect mix. The only (but a big one) drawdown is the peformance implication of the JSON generated for chart - over 50k lines of raw data for chart, you start to feel the lag.
I'm pretty good at Python and do most of my work in it nowadays, but if I need to make a data visualization, I still go to R just for ggplot2. Nothing currently in Python compares (not even the "ggplot2 port"), and it takes an order of magnitude longer to make a comparable viz.
It seems like a common critique of Python is that there’s choice in which tool to use for a given task (serve web, fast numerical code, and here, visualization). This is ironic given a language that says there should be one obvious way to do it, but perhaps it reflects a diversity of applications, similarly to how enterprise Java seems overengineered until the day that that dependency injection mumbo jumbo allows you to cover a paying use case that would’ve otherwise taken ages.<p>If I want pretty web-ish scatter plots Bokey or Plotly, and if I need desktop GUI visualizations, PyQtGraph, etc. Thankfully there’s no one saying we need a single toolkit for everything (though Matplotlib does try to do this to some extent)
Missed one! Spotify just released their own dataviz library yesterday <a href="https://labs.spotify.com/2018/11/15/introducing-chartify-easier-chart-creation-in-python-for-data-scientists/" rel="nofollow">https://labs.spotify.com/2018/11/15/introducing-chartify-eas...</a>
A friend of mine created a page to show code snippets and comparisons between a few of the plotting libraries available in Python (and R) -- <a href="http://pythonplot.com/" rel="nofollow">http://pythonplot.com/</a>
The reason there's so many libraries is that none of them has emerged as a clear leader. If there was one really great library for the majority of use-cases, you'd see consolidation around it, like you mostly have with ggplot2 in R.
I can totally recommend vega and vega-lite. It's just json and web based so it pretty language independent. There's no difference between just messing around with data sciency stuff in a repl or making a graph for your website, it's all the same.
I tried a number of libraries a few years ago. I eventually settled on plotly. Between that and dash (<a href="https://plot.ly/products/dash/" rel="nofollow">https://plot.ly/products/dash/</a>) i can build some pretty powerful visualizations very easily.
While appreciate this catalog of options this is a lengthy article about different visualization libraries and there are no actual images of what they look like.
Also missed my Veusz GUI plotting package and library. <a href="https://veusz.github.io/" rel="nofollow">https://veusz.github.io/</a>
I'm a huge fan of Altair, which I recommend as the 'sensible default' tool in the data science team I work in. I recently wrote a blog post about why I think it's the best option:<p><a href="https://medium.com/@robin.linacre/why-im-backing-vega-lite-as-our-default-tool-for-data-visualisation-51c20970df39" rel="nofollow">https://medium.com/@robin.linacre/why-im-backing-vega-lite-a...</a>
Python is an easy to use powerful development language. Ironically, this leads to too many projects solving the same problems. It isn’t just visualization that has so many libraries, it’s almost everything.<p>I first noticed this effect with Python web-frameworks years ago. There was always some new framework that worked better or differently for some use case. (There are 73 web-frameworks listed in the python.org wiki.)<p>A few more examples from pypi.org searches:<p>“Reed-Solomon” 84 packages<p>“Elliptic-curve” 535 packages<p>“Nearest neighbor” 408 packages<p>“Simplex” (as in simplex method) 29<p>“Django” 10,000+<p>I just noticed a HN post about a RAFT implementation in Haskell. Pypi.org says there are 22 hits for RAFT in the python package index.
The more I try to understand plotting documentations the more I end up going to SO for even minute detour from the conventional usecase. This is the same situation with pandas. I wonder how I would ever work with these without an internet connection.
Perhaps atleast some of the python visualization libs are built upon Matplotlib. I do feel that Matplotlib is kind of unintutive, trying in some ways to emulate Matlab, which isn't an elegant system to start with. For people directly using python without any prior experience with matlab, that is unnecessary baggage.<p>Also 3d plotting and graphing needs much more features. Trying to plot vectors in 3d, I found things are still very rudimentary. Even for surfaces, if you want to plot an ellipsoid, for eg, you need to reformulate the surface in polar form, then only matplotlib is able to generate it, were as Mathematica can generate surfaces from Cartesian expressions.
I wonder if it's too easy to make one of these libraries. Don't get me wrong I'VE never tried to build one of these, I'm sure it's hard. But maybe it's not so hard that people won't just make their own that fit their needs better.<p>If this is true I would say it's a positive feature of python, but also a reason for fragmentation. You tend to see less fragmentation in earlier and lower level technologies. This is partially due to time, only good software or hardware lasts. But I also hypothesis it was because things are harder down there, you need bigger teams and more resources, which tends towards centralization.
I don’t really understand the appeal of D3. It’s basically just a thin wrapper over SVG, so thin in fact, to do anything interesting, you’re stuck manipulating SVG elements yourself.<p>There’s a good idea there, but D3 just isn’t quite right.
Remember the old joke, "Python is the only language that has more web frameworks than keywords."?<p>What eventually happened was that the BDFL blessed Django and most of the others withered. Some had enough ecosystem, or were components of some other larger project, or had a niche advantage of Django, and they managed to survive.<p>I think the reason why web frameworks and data viz systems proliferate in Python is just that they are so easy to write, yet still challenging enough to be really fun, and you get a lot of highly, uh, visible feedback and reward for doing it.
I teach Python for Data Analysis and struggle in recommending good visualization solutions in Python.<p>Most of my students are used to Excel and are new to programming.<p>Expensive proprietary packages like Tableau(and even Power BI) provide a much better experience when it comes to presenting your results to management.<p>So we do the data gathering/wrangling/munching/analysis in Python but export the results for commercial visualization packages.<p>I can't in good conscience recommend any one Python library for visualization.<p>I mean for people who are just getting a hang of Python not Wes McKinney :)
I think SQL can be used, so I just use SQLite for the data visualization. I wrote a extension to do so, but many thing it does not yet include. I think SQLite extensions for doing graphics will be good idea.
I can absolutely recommend Jupyter + <a href="https://github.com/maartenbreddels/ipyvolume" rel="nofollow">https://github.com/maartenbreddels/ipyvolume</a>
I wonder if Plotly will ever take privacy and security seriously... I've had this ticket open for years. <a href="https://github.com/plotly/plotly.js/issues/316" rel="nofollow">https://github.com/plotly/plotly.js/issues/316</a>