It's a details, but I keep seeing it:<p>> Yet, [the scientists] struggle to move away from Python, because of network effects, and because Python’s beginner-friendliness is appealing to scientists for whom programming is not a first language.<p>I don't believe it's the whole story.<p>In my case, during my 13 years in academia, I saw my field going away from C++ and towards python. Not because of network effects (it was the opposite: it was more difficult to not use what everybody was using), or because scientists were not able to program (the entry language of the whole field was C++, and python arrived only because scientists with a deep knowledge of C++ started to themselves switch the core library to be usable with python).<p>I think something that computer scientists forgot when they consider the subject is that the way computer scientists do software is just not working when you do science.<p>In science, you use coding as an exploratory tool. You are lucky if 10% of your code ends up being used in your final publication. Because the 90% was only there to understand and to progress towards the proper direction. For this reason, things like declaring variables, which is very important when one makes a professional software, are too costly to be useful when you need to write down a piece of code that you will ever run once to check a small hypothesis, especially when you have another language not requiring it.
Another aspect is that you will present your scientific results to your colleagues, not your code (they are not interested in that), and they will come up with questions or good ideas, all very good for science, but rarely compatible with the way your algorithm was built in the first place, and you will need to shoe-horn it into your code (to test it) without taking 3 weeks. In this case, python flexibility and hackability is very useful.<p>It's also visible in the popularity of things like Jupyter notebooks (I have to acknowledge it even if I personally don't like working with such tools), which reuse a working approach similar to what was done in mathematica and matlab, that were created with the scientific workflow in mind.<p>I'm sure python simplicity has played a role. But I have the feeling that some people are totally oblivious on the fact that there may be other reasons.
I'm curious how you would do data oriented programming in a language with no type system and no control over memory layout. And I guess the answer is "you can't, but JITs might exist someday that do it for you"<p>But you can't wave your hands around and say compiler optimizations will fix performance problems - they can, but they're not magic, and the arrow in the proverbial knee for optimization passes are language semantics that make them impossible to realize (forcing the authors to either abandon the passes, or rely on things like dynamic deoptimization which is not free).
I'd like to plug riptables (<a href="https://github.com/rtosholdings/riptable" rel="nofollow">https://github.com/rtosholdings/riptable</a>), which is (more-or-less) a performance upgrade to pandas.
Hmm nice article but imho skips over the biggest optimization: numpy uses BLAS libraries so stuff like<p>> >>> multiply_by_two = homogenous_array * 2<p>will be calculated most of the times using a BLAS library - whichever you are using (<a href="https://numpy.org/devdocs/user/building.html" rel="nofollow">https://numpy.org/devdocs/user/building.html</a>)
> In practice, scientific computing users rely on the NumPy family of libraries e.g. NumPy, SciPy, TensorFlow, PyTorch, CuPy, JAX, etc..<p>this is a somewhat confusing statement. most of these libraries actually don't rely on numpy. e.g. tensorflow ultimately wraps c++/eigen tensors [0] and numpy enters somewhere higher up in their python integration<p>[0] <a href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/tensor.h" rel="nofollow">https://github.com/tensorflow/tensorflow/blob/master/tensorf...</a>