TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Data-Oriented Programming in Python

154 pointsby brileeover 2 years ago

8 comments

cauchover 2 years ago
It&#x27;s a details, but I keep seeing it:<p>&gt; Yet, [the scientists] struggle to move away from Python, because of network effects, and because Python’s beginner-friendliness is appealing to scientists for whom programming is not a first language.<p>I don&#x27;t believe it&#x27;s the whole story.<p>In my case, during my 13 years in academia, I saw my field going away from C++ and towards python. Not because of network effects (it was the opposite: it was more difficult to not use what everybody was using), or because scientists were not able to program (the entry language of the whole field was C++, and python arrived only because scientists with a deep knowledge of C++ started to themselves switch the core library to be usable with python).<p>I think something that computer scientists forgot when they consider the subject is that the way computer scientists do software is just not working when you do science.<p>In science, you use coding as an exploratory tool. You are lucky if 10% of your code ends up being used in your final publication. Because the 90% was only there to understand and to progress towards the proper direction. For this reason, things like declaring variables, which is very important when one makes a professional software, are too costly to be useful when you need to write down a piece of code that you will ever run once to check a small hypothesis, especially when you have another language not requiring it. Another aspect is that you will present your scientific results to your colleagues, not your code (they are not interested in that), and they will come up with questions or good ideas, all very good for science, but rarely compatible with the way your algorithm was built in the first place, and you will need to shoe-horn it into your code (to test it) without taking 3 weeks. In this case, python flexibility and hackability is very useful.<p>It&#x27;s also visible in the popularity of things like Jupyter notebooks (I have to acknowledge it even if I personally don&#x27;t like working with such tools), which reuse a working approach similar to what was done in mathematica and matlab, that were created with the scientific workflow in mind.<p>I&#x27;m sure python simplicity has played a role. But I have the feeling that some people are totally oblivious on the fact that there may be other reasons.
评论 #33770334 未加载
评论 #33769081 未加载
评论 #33768602 未加载
评论 #33769928 未加载
评论 #33770537 未加载
dupedover 2 years ago
I&#x27;m curious how you would do data oriented programming in a language with no type system and no control over memory layout. And I guess the answer is &quot;you can&#x27;t, but JITs might exist someday that do it for you&quot;<p>But you can&#x27;t wave your hands around and say compiler optimizations will fix performance problems - they can, but they&#x27;re not magic, and the arrow in the proverbial knee for optimization passes are language semantics that make them impossible to realize (forcing the authors to either abandon the passes, or rely on things like dynamic deoptimization which is not free).
评论 #33768439 未加载
评论 #33767077 未加载
评论 #33769341 未加载
评论 #33768026 未加载
wheelerof4teover 2 years ago
To spare you a couple minutes of your life, the article is saying this:<p>Python + C modules = Speed<p>Nothing new here, move along.
评论 #33768286 未加载
hgibbsover 2 years ago
I&#x27;d like to plug riptables (<a href="https:&#x2F;&#x2F;github.com&#x2F;rtosholdings&#x2F;riptable" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rtosholdings&#x2F;riptable</a>), which is (more-or-less) a performance upgrade to pandas.
评论 #33767364 未加载
_visgeanover 2 years ago
Hmm nice article but imho skips over the biggest optimization: numpy uses BLAS libraries so stuff like<p>&gt; &gt;&gt;&gt; multiply_by_two = homogenous_array * 2<p>will be calculated most of the times using a BLAS library - whichever you are using (<a href="https:&#x2F;&#x2F;numpy.org&#x2F;devdocs&#x2F;user&#x2F;building.html" rel="nofollow">https:&#x2F;&#x2F;numpy.org&#x2F;devdocs&#x2F;user&#x2F;building.html</a>)
评论 #33767652 未加载
tomrodover 2 years ago
This is a wonderfully technical article. I&#x27;d love to learn more about Python internals as a scientific coder.
评论 #33766461 未加载
评论 #33766379 未加载
college_physicsover 2 years ago
&gt; In practice, scientific computing users rely on the NumPy family of libraries e.g. NumPy, SciPy, TensorFlow, PyTorch, CuPy, JAX, etc..<p>this is a somewhat confusing statement. most of these libraries actually don&#x27;t rely on numpy. e.g. tensorflow ultimately wraps c++&#x2F;eigen tensors [0] and numpy enters somewhere higher up in their python integration<p>[0] <a href="https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;tensorflow&#x2F;blob&#x2F;master&#x2F;tensorflow&#x2F;core&#x2F;framework&#x2F;tensor.h" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;tensorflow&#x2F;blob&#x2F;master&#x2F;tensorf...</a>
hopplaover 2 years ago
I wonder if the concept of pointer lookup latency also applies to other languages, such as Go. I assume so though…