TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Why I Still Use Python for High Performance Scientific Computing

273 pointsby subnaughtover 9 years ago

17 comments

mselloutover 9 years ago
Summary in the conclusion:<p>&quot;The end result is an implementation several orders of magnitude faster than the current reference implementation in Java. ... [Python] makes the first version easy to implement and provides plenty of powerful tools for optimization later when you understand where and how you need it.&quot; [edited to be a statement instead of rhetorical question]
评论 #10661282 未加载
评论 #10661281 未加载
评论 #10662071 未加载
lmmover 9 years ago
But you can have both. In Scala I can write prototypes just as rapidly as Python, but I can run them with close-to-native performance. I can even explore interactively in a REPL but backed by the power of my company&#x27;s big computer cluster, using spark-shell. The profiling capabilities are excellent, but when I spot a bottleneck I can solve it in the language directly, without needing the awkwardness of cython or of converting to&#x2F;from numpy formats.<p>(And while I personally love Scala, there&#x27;s nothing magic about it in this regard. There&#x27;s no reason a language can&#x27;t offer Python-like expressiveness and Java-like performance, and many modern languages do)
评论 #10661930 未加载
评论 #10668106 未加载
kitdover 9 years ago
Large-scale data processing jobs normally arrange themselves into data acquistion&#x2F;cleaning, grunt numerical work and result formatting&#x2F;display. These tasks have very different requirements so a combination of a tool that can do all the data handling easily (ie Python) + a tool that can throw the CPU at a numerical problem (ie C) will work as a great combination.<p>In contrast, if you work in Java, you are trying to use the same tool for both jobs, and you may well fall between 2 stools. And I say that as a typical Java-head.<p>My only question about the 2-tool combination is whether there are better combinations. Python has all the libraries and community support so any alternative would need similar. Maybe Node?<p>As for the number crunching, I think Rust would be a better choice here. Good memory management is its USP and that can have significant performance benefits.
评论 #10662400 未加载
评论 #10662869 未加载
评论 #10663312 未加载
pen2lover 9 years ago
This may be a liiiitle bit off-topic, but I really need to get it off my chest: Python for high-performance scientific computer works <i>beautifully</i>... it&#x27;s a dream. Scipy&#x2F;numpy, matplotlib, pandas, ipython. They&#x27;re all unbelievably awesome. It all just works.<p><i>Except</i>, when you&#x27;re on Windows, and it just doesn&#x27;t. Just installing things and doing the &#x27;hello world&#x27; for aforementioned libraries is laughably impossible.<p>So, use Python, but use it only on Linux.<p>(Okay, if you absolutely must do it in Windows: Use Anaconda).
评论 #10662070 未加载
评论 #10662029 未加载
评论 #10661591 未加载
评论 #10663968 未加载
评论 #10665583 未加载
评论 #10662026 未加载
评论 #10664059 未加载
pbowyerover 9 years ago
&gt; once I had a decent algorithm, I could turn to Cython to tighten up the bottlenecks and make it fast.<p>What are your preferred ways to profile Python code? Coming recently from PHP, where we have XDebug&#x2F;KCachegrind, the excellent Facebook-sponsored Xhprof, <a href="https:&#x2F;&#x2F;blackfire.io" rel="nofollow">https:&#x2F;&#x2F;blackfire.io</a> and <a href="https:&#x2F;&#x2F;tideways.io" rel="nofollow">https:&#x2F;&#x2F;tideways.io</a>, it&#x27;s felt a step backwards.<p>I&#x27;ve tried line_profiler, and used memory_profiler and cProfile with pyprof2calltree and KCachegrind. I&#x27;ve found the cProfile output confusing when it crosses the Python-C barrier for numpy, sklearn etc.
评论 #10661944 未加载
评论 #10661816 未加载
评论 #10661801 未加载
cballardover 9 years ago
Why isn&#x27;t Haskell, or any other functional language, popular for this sort of thing? Turning A into B is what FP excels at, and you shouldn&#x27;t have to reason about side effects, besides writing the graph images somewhere.<p>From what I&#x27;ve heard from a friend of using other people&#x27;s code in one particular scientific field (stringly type some of the things, probably accidentally, don&#x27;t document this), an at-least-passable type system would be a huge improvement.
评论 #10661476 未加载
评论 #10661511 未加载
评论 #10661256 未加载
评论 #10661292 未加载
评论 #10661261 未加载
评论 #10661936 未加载
评论 #10664312 未加载
评论 #10663603 未加载
kfkover 9 years ago
I have in my hands a pretty interesting BI project for a big company. So far, the proposal on the table has been .NET and SQL Server, but I am wondering if I should at least try to give python a chance. Pandas is a great library, with great people working on it. Django the same. On the other hand, .NET has lots of professional (aka: with paid licenses) libraries that seem more fit for an enterprise project. Looking from a company perspective, the drawback python has is, strangely, the lack of paid for alternatives. It&#x27;s not that people in companies don&#x27;t trust open source (hadoop is becoming big here too), but one wonders if the developers will be able to find the support they need in case any issue arise from a free library.
评论 #10662032 未加载
评论 #10662039 未加载
评论 #10663219 未加载
评论 #10663481 未加载
评论 #10661985 未加载
评论 #10676460 未加载
banku_broughamover 9 years ago
A very beginner Java programmer here. It&#x27;s a nicely organized notebook, great demo, but: seems like a lot of effort was put into optimizing the python efforts, and none for Java. Isn&#x27;t that an unfair comparison?<p>My real question is, is it so much easier to do this excercise in Python than Java, assuming equal proficiency in either case?
评论 #10663525 未加载
评论 #10663637 未加载
rdtscover 9 years ago
Agreed. Python is an excellent tool in that respect. Batteries included helps. Being able to access fast C routines help. Compile to C projects like Numba and Cython also help. And of course, ipython (Jupyter) notebooks for exploration.
cosmoharriganover 9 years ago
Jake Vanderplas previously wrote an excellent blog post about Python performance and scientific computing: <a href="https:&#x2F;&#x2F;jakevdp.github.io&#x2F;blog&#x2F;2014&#x2F;05&#x2F;09&#x2F;why-python-is-slow&#x2F;" rel="nofollow">https:&#x2F;&#x2F;jakevdp.github.io&#x2F;blog&#x2F;2014&#x2F;05&#x2F;09&#x2F;why-python-is-slow...</a>
daemonkover 9 years ago
The interesting insight from the article is that python might be a good language for learning algorithms. The fast development time allows you to write a complete program (albeit clunky) without the pre-optimizing you might be tempted to do in other languages.
kriroover 9 years ago
My current setup for &quot;scientific computing&quot; is RStudio and CSV files for quickly running typical stats-tests (a couple of t-tests and a tost + krippendorff&#x27;s alpha the last couple of month) and python+libraries for anything that resembles &quot;building stuff&quot; (mostly scikit-learn to build some classifiers). I mostly use R &quot;as a consumer&quot; i.e. I basically use RStudio whenever my colleagues fire up SPSS. That combination works fairly well. I&#x27;d recommend it to anyone who enters academia in any field that involves statistics who doesn&#x27;t want to use the typical proprietary tools (I&#x27;ve also tried PSPP and it works ok for basic tasks but lacks a lot of functionality. If all you want to do is run a quick t-test or ANOVA it&#x27;s a decent tool).
hcrispover 9 years ago
Good article. Couldn&#x27;t find who wrote it since it doesn&#x27;t have a byline. I&#x27;m guessing it was Leland McInnes?
评论 #10661636 未加载
SeanDavover 9 years ago
Couple of questions:<p>- Could this be&#x2F;Was this developed in Python 3.x<p>- what is this &quot;notebook&quot; he keeps on referring to?
评论 #10664018 未加载
buildopsover 9 years ago
Absolutely and it is even easier if you use Ceemple for your IDE
bipin_nagover 9 years ago
I use Spark. Will using Python help a lot ?
boulosover 9 years ago
If Numpy, Pandas, etc. were wrappable from JavaScript this could have easily been titled &quot;Why I use Node.js for High Performance Scientific Computing&quot;.<p>The &quot;Python&quot; here isn&#x27;t particularly material to the result, it&#x27;s mostly a wrapper around C. Toss in Cython, and now you&#x27;ve really gone outside the bounds of &quot;I&#x27;m just using &#x27;Python&#x27; for HPC!&quot;.<p>I agree some of the tooling and niceties are beyond a doubt best in breed with Python, but it&#x27;s disingenuous to equate this to &quot;writing HPC code in Python&quot;. If you had written a RPython to Verilog translator that produced an FPGA of your algorithm would you call that &quot;using Python&quot;?
评论 #10661529 未加载
评论 #10661690 未加载
评论 #10661514 未加载
评论 #10662221 未加载
评论 #10661507 未加载
评论 #10674966 未加载
评论 #10662307 未加载