TechEcho

1 comment

gjredaover 11 years ago

I'm super interested in the chapter on creating reusable command line tools.<p>I've found the command line to be ideal for performing a lot of simple, memory-intensive tasks (filtering/munging/sorting/etc. a massive text file).<p>However, after data collection (and munging), data science is typically A LOT of _exploratory_ analysis. I think it's extremely important that all practitioners approach analysis with the mindset of making it easily reproducible (and if possible, flexible - don't hard code date ranges, file paths, etc.).<p>I tend to stick with IPython Notebook (and heavily recommend it). I fear that heavy analysis at the command line would consist of too many one-liners and thus be difficult to read and maintain.

评论 #6867358 未加载

Lean, mean data science machine

1 comment

Lean, mean data science machine

1 comment