I like Pandas and I'd like to move to Python for my data analysis. Python is a beautiful language I can use for many other purposes than just data analysis. I find it more readable and self-documenting than R.<p>Nonetheless I use R, ggplot2 is excellent and its graphs are better looking than matplotlib. I really like R Markdown for intermingling R and Markdown in your code. I suppose Ipython is the python equivalent. R especially shines with its numerous libraries.<p>Anyway, for every project I debate whether to use R or Python, perhaps I should look into rmagic/rpy2 for iPython as a go between.
To install the whole set of Python modules needed and iPython in a virtualenv (trick: there is no "pylab" module to install):<p><pre><code> % virtualenv --distribute --no-site-packages pandas_venv
[blahblah]
% . pandas_venv/bin/activate
(pandas_venv) % easy_install readline # Probably only needed in Mac OS X for iPython to behave
[blahblah]
(pandas_venv) % pip install ipython
[blah blah]
(pandas_venv) % pip install numpy
[lots of blahblah]
(pandas_venv) % pip install matplotlib
[quite a bit of blahblah]
(pandas_venv) % pip install pandas
[some more blah blah]
(pandas_venv) % pandas_venv/bin/ipython --no-banner
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: import pylab as pl
In [4]:</code></pre>
I am in the process of migrating all of my analyses from Matlab and R to Python. I have been meaning to do this for quite some time and finally pandas is mature enough to be able to completely replace both Matlab and R for straightforward tasks. If I need something Python doesn't offer, it's still fairly simple to do isolated tasks elsewhere. For me the biggest reasons for change are easy integration with the web and better language features (for Matlab, R language is great, just terribly slow for intensive tasks).<p>What I miss the most:<p>- Matlab <--> Excel link (on Windows) - an excel add-on that lets you send back and forth arrays very easily. You need a spreadsheet when you work with datasets, and interchanging data through files just isn't that convenient.<p>- Matlab's IDE features (debugging, documentation, publishing, variable inspection).<p>- ggplot2
The creator of pandas wrote a book, <i>Python on Data Analysis</i>, which covers NumPy and Pandas. I found it an excellent primer.<p><a href="http://oreilly.com/shop/product/0636920023784.html" rel="nofollow">http://oreilly.com/shop/product/0636920023784.html</a>
Interesting analysis, but it would really benefit from a section about data.table. For me and many others, data.table has almost completely replaced data.frame (of which data.table is a subclass) and completely replaced plyr. The speed and ease of use of data.table are much more favorably comparable with pandas than the R tools mentioned here.
That's useful. Hadn't really looked at Pandas before.<p>Slightly OT:<p>I'm using in-memory sqlite3 with rtree to find objects within bounds in a 2D space. Is there a different library people would recommend for this in Python?
Are there any comments as to the maturity of Pandas as compared to R?<p>I am used to the Python syntax, and while R is another language to learn, my assumption is that for data analysis its age compared to Pandas implies stability.<p>I could of course be wrong.