TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

I don't like NumPy

469 点作者 MinimalAction2 天前

63 条评论

WCSTombs1 天前
If your arrays have more than two dimensions, please consider using Xarray [1], which adds dimension naming to NumPy arrays. Broadcasting and alignment then becomes automatic without needing to transpose, add dummy axes, or anything like that. I believe that alone solves <i>most</i> of the complaints in the article.<p>Compared to NumPy, Xarray is a little thin in certain areas like linear algebra, but since it&#x27;s very easy to drop back to NumPy from Xarray, what I&#x27;ve done in the past is add little helper functions for any specific NumPy stuff I need that isn&#x27;t already included, so I only need to understand the NumPy version of the API well enough one time to write that helper function and its tests. (To be clear, though, the majority of NumPy ufuncs are supported out of the box.)<p>I&#x27;ll finish by saying, to contrast with the author, I don&#x27;t dislike NumPy, but I do find its API and data model to be insufficient for truly multidimensional data. For me three dimensions is the threshold where using Xarray pays off.<p>[1] <a href="https:&#x2F;&#x2F;xarray.dev" rel="nofollow">https:&#x2F;&#x2F;xarray.dev</a>
评论 #43999287 未加载
评论 #43998683 未加载
评论 #43998923 未加载
评论 #43998931 未加载
评论 #43999469 未加载
评论 #43999196 未加载
评论 #43999606 未加载
ChrisRackauckas1 天前
One of the reasons why I started using Julia was because the NumPy syntax was so difficult. Going from MATLAB to NumPy I felt like I suddenly became a mediocre programmer, spending less time on math and more time on &quot;performance engineering&quot; of just trying to figure out how to use NumPy right. Then when I went to Julia it made sense to vectorize when it felt good and write a loop when it felt good. Because both are fast, focus on what makes the code easiest to read an understand. This blog post encapsulates exactly that experience and feeling.<p>Also treating things like `np.linalg.solve` as a black box that is the fastest thing in the world and you could never any better so please mangle code to call it correctly... that&#x27;s just wrong. There&#x27;s many reasons to build problem specific linear algebra kernels, and that&#x27;s something that is inaccessible without going deeper. But that&#x27;s a different story.
评论 #43999127 未加载
评论 #43998779 未加载
评论 #43999507 未加载
评论 #43998533 未加载
brosco1 天前
Compared to Matlab (and to some extent Julia), my complaints about numpy are summed up in these two paragraphs:<p>&gt; Some functions have axes arguments. Some have different versions with different names. Some have Conventions. Some have Conventions and axes arguments. And some don’t provide any vectorized version at all.<p>&gt; But the biggest flaw of NumPy is this: Say you create a function that solves some problem with arrays of some given shape. Now, how do you apply it to particular dimensions of some larger arrays? The answer is: You re-write your function from scratch in a much more complex way. The basic principle of programming is abstraction—solving simple problems and then using the solutions as building blocks for more complex problems. NumPy doesn’t let you do that.<p>Usually when I write Matlab code, the vectorized version just works, and if there are any changes needed, they&#x27;re pretty minor and intuitive. With numpy I feel like I have to look up the documentation for every single function, transposing and reshaping the array into whatever shape that particular function expects. It&#x27;s not very consistent.
评论 #43998804 未加载
评论 #43998189 未加载
评论 #44000712 未加载
评论 #44000401 未加载
vector_spaces1 天前
My main issue with numpy is that it&#x27;s often unclear what operations will be vectorized or how they will be vectorized, and you can&#x27;t be explicit about it the way you can with Julia&#x27;s dot syntax.<p>There are also lots of gotchas related to types returned by various functions and operations.<p>A particularly egregious example: for a long time, the class for univariate polynomial objects was np.poly1d. It had lots of conveniences for doing usual operations on polynomials<p>If you multiply a poly1d object P on the right by a complex scalar z0, you get what you probably expect: a poly1d with coefficients scaled by z0.<p>If however you multiply P on the left by z0, you get back an array of scaled coefficients -- there&#x27;s a silent type conversion happening.<p>So<p><pre><code> P*z0 # gives a poly1d z0*P # gives an array </code></pre> I know that this is due to Python associativity rules and laissez-faire approach to datatypes, but it&#x27;s fairly ugly to debug something like this!<p>Another fun gotcha with poly1d: if you want to access the leading coefficient for a quadratic, you can do so with either P.coef[0] or P[2]. No one will ever get these confused, right?<p>These particular examples aren&#x27;t really fair because the numpy documentation describes poly1d as a part of the &quot;old&quot; polynomial API, advising new code to be written with the `Polynomial` API -- although it doesn&#x27;t seem it&#x27;s officially deprecated and no warnings are emitted when you use poly1d.<p>Anyway, there are similar warts throughout the library. Lots of gotchas having the shape of silent type conversions and inconsistent datatypes returned by the same method depending on its inputs that are downright nightmarish to debug
评论 #43999063 未加载
评论 #44004047 未加载
SirHumphrey1 天前
The main problem (from my point of view) of python data science ecosystem is a complete lack of standardization on anything.<p>You have ten different libraries that try to behave like 4 other languages and the only point of standardization is that there is usually something like .to_numpy() function. This means that most of the time I was not solving any specific problem related to data processing, but just converting data from a format one library would understand to something another library would understand.<p>In Julia (a language with it&#x27;s own problems, of course) things mostly just work. The library for calculating uncertainties interacts well with a library handling units and all this works fine with the piloting library, or libraries solving differential equations etc. In python, this required quite a lot of boilerplate.
评论 #43998875 未加载
评论 #43998570 未加载
blululu1 天前
The author brings up some fair points. I feel like I had all sorts of small grievances transitioning from Matlab to Numpy. Slicing data still feels worse on Numpy than Matlab or Julia, but this doesn&#x27;t justify the licensing costs for the Matlab stats&#x2F;signal processing toolbox.<p>The issues that are presented in this article mostly related to tensors of rank &gt;2. Numpy is originally just matrices so it is not surprising that it has problems in this domain. A dedicated library like Torch is certainly better. But Torch is difficult in its own ways. IDK, I guess the author&#x27;s conclusion that numpy is “the worst array language other than all the other array languages” feels correct. Maybe a lack of imagination on my part.
评论 #43998841 未加载
threeducks1 天前
I thought I&#x27;d do something smart and inline all the matrix multiplications into the einsums of the vectorized multi-head attention implementation from the article and set optimize=&quot;optimal&quot; to make use of the optimal matrix chain multiplication algorithm <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Matrix_chain_multiplication" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Matrix_chain_multiplication</a> to get a nice performance boost.<p><pre><code> def multi_head_attention_golfed(X, W_q, W_k, W_v, W_o, optimize=&quot;optimal&quot;): scores = np.einsum(&#x27;si,hij,tm,hmj-&gt;hst&#x27;, X, W_q, X, W_k, optimize=optimize) weights = softmax(W_k.shape[-1]**-0.5 * scores, axis=-1) projected = np.einsum(&#x27;hst,ti,hiv,hvd-&gt;shd&#x27;, weights, X, W_v, W_o, optimize=optimize) return projected.reshape(X.shape[0], W_v.shape[2]) </code></pre> This is indeed twice as fast as the vectorized implementation, but, disappointingly, the naive implementation with loops is even faster. Here is the code if someone wants to figure out why the performance is like that: <a href="https:&#x2F;&#x2F;pastebin.com&#x2F;raw&#x2F;peptFyCw" rel="nofollow">https:&#x2F;&#x2F;pastebin.com&#x2F;raw&#x2F;peptFyCw</a><p>My guess is that einsum could do a better job of considering cache coherency when evaluating the sum.
评论 #44003102 未加载
aborsy1 天前
My main problem with numpy is that the syntax is verbose. I know from the programming language perspective it may not be considered a drawback (might even be a strength). But in practice, the syntax is a pain compared to matlab or Julia. The code for the latter is easier to read, understand, consistent with math notation.
评论 #43998865 未加载
dima551 天前
Hear hear! Some of these complaints have been resolved with numpysane: <a href="https:&#x2F;&#x2F;github.com&#x2F;dkogan&#x2F;numpysane&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;dkogan&#x2F;numpysane&#x2F;</a> . With numpysane and gnuplotlib, I now find numpy acceptable and use it heavily for everything. But yeah; without these it&#x27;s unusable.
评论 #43998079 未加载
评论 #44000723 未加载
a_t481 天前
My issue with it is how easy it is to allocate all over the place if you forget to use inplace operations. It&#x27;s even worse with cupy - rather than applying a series of operations to some data to produce some other data, you end up producing a set of data for each operation. Yes, there are workarounds, but they aren&#x27;t as ergonomic (cupy.fuse() almost does the right thing, cleanly, but is a step you have to remember to use, and doesn&#x27;t really work for anything that requires multiple shapes of array).
munchler1 天前
I have to agree with this as someone coming from a strongly-typed background (F#). PyTorch and NumPy are very flexible and powerful, but their API’s are insanely underspecified because every function seems to take multiple combinations of vaguely-typed objects. The library just figures out the right thing to do at runtime using broadcasting or other magic.<p>This kind of “clever” API seems to be both a benefit and curse of the Python ecosystem in general. It makes getting started very easy, but also makes mastery maddeningly difficult.
评论 #43999052 未加载
评论 #43998959 未加载
cycomanic1 天前
I sort of agree with the author that N&gt;3 dimensional arrays are cumbersome in numpy, that said I think this is partly because we are really not that great thinking in higher dimensions. I&#x27;m interested what the authors solution to the problem is, but unlike the author I&#x27;m not a big fan of the eigen notation, but maybe I just don&#x27;t use it often enough. I don&#x27;t see the issue with a[:,:,None] notation and that&#x27;s never given me issues, however I agree about the issue with index arrays. I often make something which think should work and then need to go back to the documentation to realise no that&#x27;s not how it works.<p>The inconsistency for argument naming is also annoying (even more if we include scipy), e.g. why is it np.fft.fft(x, axis=1) but np.fft.fftshift(x, axes=1)?!
评论 #44000305 未加载
drhagen1 天前
&gt; Personally, I think np.einsum is one of the rare parts of NumPy that’s actually good.<p>einsum only being able to do multiplication makes it quite limited. If we leaned into the Einstein notation (e.g. [1]), we could make something that is both quite nice and quite performant.<p>[1] <a href="https:&#x2F;&#x2F;tensora.drhagen.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;tensora.drhagen.com&#x2F;</a>
lvl1551 天前
I am not a big fan of Python data libraries. They’re not cohesive in style across the board. Probably why I found R to be a better “classroom” solution. Julia is nice and so is Mathematica for purely math (and hat tip to Maple).
bee_rider1 天前
Numpy is mostly just an interface to BLAS&#x2F;Lapack, but for Python, right? BLAS&#x2F;Lapack aren’t clever libraries for doing a ton of small operations, they are simple libraries for doing the easy thing (operations on big dense matrices) about as well as the hardware can.<p>Numpy is what it is. Seems more disappointing that he had trouble with the more flexible libraries like Jax.<p>Anyway there’s a clear split between the sort of functionality that Numpy specializes in and the sort that Jax does, and they don’t benefit much from stepping on each other’s toes, right?
sundarurfriend1 天前
&gt; D = 1&#x2F;(L<i>M) </i> np.einsum(&#x27;klm,ln,km-&gt;kn&#x27;, A, B, C)<p>The first time I came across Einsums was via the Tullio.jl package, and it seemed like magic to me. I believe the equivalent of this would be:<p><pre><code> @tullio D[k, n] = 1&#x2F;(L*M) * A[k, l, m] * B[l, n] * C[k, m] </code></pre> which is really close to the mathematical notation.<p>To my understanding, template strings from PEP 750 will allow for something like:<p><pre><code> D = 1&#x2F;(L*M) * np.einsum(t&#x27;{A}[k,l,m] * {B}[l,n] * {C}[k,m]&#x27;) </code></pre> right? If so, that&#x27;d be pretty neat to have.
CreRecombinase1 天前
It’s kind of wild how much work really smart people will do to get python to act like Fortran. This is why R is such a great language IMO. Get your data read and arrays in order in dynamic, scheme-like language, then just switch to Fortran and write actual Fortran like an adult.
评论 #43998270 未加载
评论 #43999138 未加载
评论 #43998932 未加载
albertzeyer1 天前
It seems the main complaint is about confusing shapes &#x2F; dimensions. Xarray has already be mentioned, but this is a broader concept, called named dimensions, sometimes also named tensors, named axes, labeled tensors, etc, which has often been proposed before, and many implementations exists.<p><a href="https:&#x2F;&#x2F;nlp.seas.harvard.edu&#x2F;NamedTensor" rel="nofollow">https:&#x2F;&#x2F;nlp.seas.harvard.edu&#x2F;NamedTensor</a><p><a href="https:&#x2F;&#x2F;namedtensor.github.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;namedtensor.github.io&#x2F;</a><p><a href="https:&#x2F;&#x2F;docs.pytorch.org&#x2F;docs&#x2F;stable&#x2F;named_tensor.html" rel="nofollow">https:&#x2F;&#x2F;docs.pytorch.org&#x2F;docs&#x2F;stable&#x2F;named_tensor.html</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;JuliaArrays&#x2F;AxisArrays.jl">https:&#x2F;&#x2F;github.com&#x2F;JuliaArrays&#x2F;AxisArrays.jl</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;ofnote&#x2F;tsalib">https:&#x2F;&#x2F;github.com&#x2F;ofnote&#x2F;tsalib</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;google-deepmind&#x2F;tensor_annotations">https:&#x2F;&#x2F;github.com&#x2F;google-deepmind&#x2F;tensor_annotations</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;google-deepmind&#x2F;penzai">https:&#x2F;&#x2F;github.com&#x2F;google-deepmind&#x2F;penzai</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;mesh&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;mesh&#x2F;</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;torchdim">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;torchdim</a><p><a href="https:&#x2F;&#x2F;xarray.dev&#x2F;" rel="nofollow">https:&#x2F;&#x2F;xarray.dev&#x2F;</a><p><a href="https:&#x2F;&#x2F;returnn.readthedocs.io&#x2F;en&#x2F;latest&#x2F;getting_started&#x2F;data.html" rel="nofollow">https:&#x2F;&#x2F;returnn.readthedocs.io&#x2F;en&#x2F;latest&#x2F;getting_started&#x2F;dat...</a> (that&#x27;s my own development; unfortunately the doc is very much outdated on that)
评论 #44004689 未加载
huqedato1 天前
Try Julia&#x27;s built-in arrays or LinearAlgebra standard library handles matrix operations. You&#x27;ll never go back to python (my case).
harpiaharpyja1 天前
I don&#x27;t use numpy much, but based on the documentation for that function and the fact that you can index by None to add another dimension, it seems like the correct call is:<p><pre><code> y = linalg.solve(A,x[:,:,None]) </code></pre> The documentation given says that A should be (..., M, M), and x should be (..., M, K). So if A is 100x5x5 and x is 100x5, then all you need to do is convert x to 100x5x1.<p>Is that right? It doesn&#x27;t seem that bad.
评论 #43999735 未加载
DrFalkyn1 天前
Back when our lab transitioned from Matlab to Python I used numpy&#x2F;scipy quite a bit. I remember heavily leaning on numpy.reshape to get things to work correctly. In some cases I did resort to looping.
zahlman1 天前
TFA keeps repeating &quot;you can&#x27;t use loops&quot;, but aren&#x27;t they, like, <i>merely</i> less performant? I understand that there are going to be people out there doing complex algorithms (perhaps part of an ML system) where that performance is crucial and you might as well not be using NumPy in the first place if you skip any opportunities to do things in The Clever NumPy Way. But say I&#x27;m just, like, processing a video frame by frame, by using TCNW on each frame and iterating over the time dimension; surely that won&#x27;t matter?<p>Also: TIL you can apparently use <i>multi-dimensional</i> NumPy arrays as NumPy array indexers, and they don&#x27;t just collapse into 1-dimensional iterables. I expected `A[:,i,j,:]` not to work, or to be the same as if `j` were just `(0, 1)`. But instead, it apparently causes transposition with the previous dimension... ?
评论 #43998354 未加载
评论 #43998230 未加载
评论 #43998333 未加载
评论 #43998113 未加载
hatthew1 天前
Am I the only one who feels like this is a rant where the author is projecting their unfamiliarity with numpy? Most of the examples seem like straw men, and I can&#x27;t tell if that&#x27;s because they are or if the author genuinely thinks that these are all significant problems.<p>For example, in the linalg.solve problem, based on reading the documentation for &lt;60 seconds I had two guesses for what would work, and from experimenting it looks like both work equally. If you don&#x27;t want to take the time to understand the documentation or experiment to see what works, then just write the for loop.<p>For indexing problems, how about just don&#x27;t use weird indexing techniques if you don&#x27;t understand them? I have literally never needed to use a list of lists to index an array in my years of using numpy. And if you do need to use it for some reason, spend 1 minute to test out what happens. &quot;What shape does B have?&quot; is a question that can be answered with `print(B.shape)`. If the concern is about reading code written by other people, then context should make it clear what&#x27;s happening, and if it&#x27;s not clear then that&#x27;s the fault of the person who wrote the code for not using sensible variable names&#x2F;comments.
评论 #43999214 未加载
评论 #44001368 未加载
rrr_oh_man1 天前
Coming from the excellent tidyverse or even data.table in R, Numpy always has felt like twenty steps back into mindfuck territory.
评论 #43998502 未加载
评论 #43998195 未加载
Imnimo1 天前
The trouble is that I can never tell when the cause of my issues is &quot;numpy&#x27;s syntax&#x2F;documentation is bad&quot; and when it&#x27;s &quot;I am very bad at thinking about broadcasting and shape arithmetic&quot;.
actinium2261 天前
For the first example:<p><pre><code> import numpy as np A = np.random.random((100, 5, 5)) b = np.random.random((100, 5, 1)) x = np.linalg.solve(A, b) </code></pre> Admittedly the documentation is a little hard to read if you just look at &#x27;type&#x27; for b, but then it says &quot;Returned shape is (..., M) if b is shape (M,) and (..., M, K) if b is (..., M, K) where ... is broadcasted between a and b&quot;<p>So you have to make sure your b vector is actually a matrix and not a vector (if K=1 for your case).
WD-421 天前
The answer to all these complaints is simple: use APL. Or rather these days, BQN.
评论 #44001172 未加载
评论 #44003094 未加载
wedesoft大约 17 小时前
Implementing array operations basically requires macros and JIT compilation. I have implemented JIT compilation of tensor operations in Scheme using LLVM. Maybe I should redo it in Clojure, but I think, most programmers don&#x27;t care enough, to leave the Python ecosystem. <a href="https:&#x2F;&#x2F;wedesoft.github.io&#x2F;aiscm&#x2F;operation.html" rel="nofollow">https:&#x2F;&#x2F;wedesoft.github.io&#x2F;aiscm&#x2F;operation.html</a>
janalsncm1 天前
I would like to be able to turn off implicit broadcasting entirely. It has caused me so many evil bugs (PyTorch, but the same thing applies to Numpy).<p>Imagine all of the weirdness of js “truthiness” bugs (with silent unexpected behavior) combined with the inherent difficulty of debugging matrix multiplications in 2, 3, or 4 dimensions. I would rather herd goats in a Lumon basement.<p>Jax can do it but I’m not willing to migrate everything.
zdevito1 天前
I tried to do something similar with &#x27;first-class&#x27; dimension objects in PyTorch <a href="https:&#x2F;&#x2F;github.com&#x2F;pytorch&#x2F;pytorch&#x2F;blob&#x2F;main&#x2F;functorch&#x2F;dim&#x2F;README.md">https:&#x2F;&#x2F;github.com&#x2F;pytorch&#x2F;pytorch&#x2F;blob&#x2F;main&#x2F;functorch&#x2F;dim&#x2F;R...</a> . For instance multi-head attention looks like:<p><pre><code> from torchdim import softmax def multiheadattention(q, k, v, num_attention_heads, dropout_prob, use_positional_embedding): batch, query_sequence, key_sequence, heads, features = dims(5) heads.size = num_attention_heads # binding dimensions, and unflattening the heads from the feature dimension q = q[batch, query_sequence, [heads, features]] k = k[batch, key_sequence, [heads, features]] v = v[batch, key_sequence, [heads, features]] # einsum-style operators to calculate scores, attention_scores = (q*k).sum(features) * (features.size ** -0.5) # use first-class dim to specify dimension for softmax attention_probs = softmax(attention_scores, dim=key_sequence) # dropout work pointwise, following Rule #1 attention_probs = torch.nn.functional.dropout(attention_probs, p=dropout_prob) # another matrix product context_layer = (attention_probs*v).sum(key_sequence) # flatten heads back into features return context_layer.order(batch, query_sequence, [heads, features]) </code></pre> However, my impression trying to get a wider userbase is that while numpy-style APIs maybe are not as good as some better array language, they might not be the bottleneck for getting things done in PyTorch. However, other domains might suffer more, and I am very excited to see a better array language catch on.
theLiminator1 天前
Anyone use xarray? Curious how it compares?
评论 #43998159 未加载
评论 #43998014 未加载
semiinfinitely1 天前
all issues raised are addressed by jax.vmap
评论 #43998280 未加载
FrameworkFred1 天前
I&#x27;m actually super-interested to see the next post.<p>TBH, if you would&#x27;ve asked me yesterday if I&#x27;m the sort of person who might get sucked in by a cliffhanger story about a numpy replacement, I&#x27;m pretty sure I would&#x27;ve been an emphatic no.<p>But I have, in fact, just tried random things in numpy until something worked...so, you know...tell me more...
ryandrake1 天前
Since this has turned into a session of “my problem with numpy is…” then I’ll add: import time. I have some short scripts whose business logic takes no time, so importing a dependency that imports numpy turns out to take a significant amount of the script’s runtime.
hodder大约 17 小时前
When I worked as a quant in trading risk, we used MATLAB and I always questioned the reasoning behind it... thinking it was mostly just historical plus the slick IDE, but it also makes this kind of thing dead simple. Vectorized code is just the defaullt.
QuadmasterXLII1 天前
Numpy and torch have opposite conventions as to which end to broadcast from if arrays hve different numbers of dimensions, and as a result I completely forgot both. At this point I have to always None both arrays to the same dimension, which is ugly but actually fairly explicit
fscknumpy1 天前
I had to make a HN account just to show this numpy &quot;feature&quot;.<p>Create an array like this np.zeros((10, 20, 30)), with shape (10, 20, 30).<p>Clearly if you ask for array[0][:, [0,1]].shape you will get an output of shape (20, 2), because you first remove the first dimension, and then : broadcasts across the second dimension, and you get the first two elements of the third dimension.<p>Also, obviously we know that in numpy array[a][b][c] == array[a,b,c]. Therefore, what do you think array[0, :, [0,1]].shape returns? (20, 2)? Nope. (2, 20).<p>Why? Nobody knows.
评论 #44002267 未加载
__mharrison__1 天前
Multiple dimensions (more than 2) are hard.<p>I was at a conference the other day, and my friend (a very smart professor) was asking me if it would be possible to move away from Xarray to Pandas or Polars...<p>Perhaps using Numba or Cython (with loops) might make it fast but less prone to confusion.<p>Luckily for me, I mostly stay in tabular data (&lt; 3 dimensions).
ebonnafoux1 天前
What I would really like is to type check the shape of a numpy array to prevent dimension array at runtime. You could use third party library like <a href="https:&#x2F;&#x2F;github.com&#x2F;ramonhagenaars&#x2F;nptyping">https:&#x2F;&#x2F;github.com&#x2F;ramonhagenaars&#x2F;nptyping</a> or <a href="https:&#x2F;&#x2F;github.com&#x2F;beartype&#x2F;beartype#numpy-arrays">https:&#x2F;&#x2F;github.com&#x2F;beartype&#x2F;beartype#numpy-arrays</a> but it will not extend to the methode of Numpy.
评论 #44003185 未加载
frollogaston1 天前
About optimization, I feel like NumPy is meant to be a de facto standard and reference implementation. It covers all use cases with decent efficiency, not the fastest way possible. There are more limited drop-in replacements that use more CPU parallelism or GPU if NumPy isn&#x27;t fast enough for your use case. Just wish it were clearer <i>which</i> NumPy build I&#x27;m installing, cause apparently `pip3 install numpy` on my Mac gave me something built with the worst flags possible.<p>About &gt;2 dimensions, I always found this confusing in NumPy but just chalked it up to &gt;2 dim arrays being inherently confusing. Maybe there really is a better way.
ris1 天前
I would much <i>much</i> prefer the cited &quot;awful&quot; syntax over the proposed loop syntax any day. Don&#x27;t make me run a little virtual machine in my head to figure out what the end result of a block of code is going to be.
neonsunset1 天前
Sighs in F# and Julia
评论 #43998534 未加载
bee_rider1 天前
A dumb thought: technically scipy has a “solve_banded” function that does a banded solve. He could easily recast his problem into a single big diagonal problem, I guess, just with some silly explicit zeros added. I wonder how the performance of that would compare, to iterating over a bunch of tiny solves.<p>Of course, it would be nice if scipy had a block diagonal solver (maybe it does?). But yea, I mean, of course it would be nice if my problem was always built in functionality of the library, lol.<p>Maybe a bsr_matrix could work.
jamesblonde1 天前
In Data for ML, everything has switch from NumPy (Pandas) to Arrow (Polars, DuckDB, Spark, Pandas 2.x, etc). However, Scikit-Learn is still a hold out, so it&#x27;s Arrow from you data sources all to way to pre-processing pipelines in Scikit-Learn when you have to go back to NumPy. In practice, it now makes more sense to separate feature pipelines in Arrow from training pipelines with Pandas&#x2F;NumPy and Scikit-Learn.*<p>*This is ML, not Deep Learning or Transformers.
评论 #44008302 未加载
phronimos1 天前
Numba is a great option for speeding up (vectorizing) loops and NumPy code, apart from CuPy and JAX. Xarray is also worth trying for tensors beyond 2 dimensions.
评论 #43998208 未加载
raydiak1 天前
I vaguely recall having a few similar complaints about Perl&#x27;s PDL back in the day. Makes me wonder why we can&#x27;t also support something closer to syntactically loop-esque constructs to express the same things without the need for special slicing&#x2F;indexing notations with their own implicit semantics.
complex_pi1 天前
NumPy allows a lot of science to happen. Grievance is fine but a little respect is as well.<p>NumPy is the lingua franca for storing and passing arrays in memory in Python.<p>Thank you NumPy!
评论 #43998238 未加载
评论 #43998205 未加载
Pompidou大约 23 小时前
Array programming requires true array languages like apl and j.
nayuki1 天前
I skimmed the article and agree with it at a high level, though I haven&#x27;t faced most of those problems personally.<p>I have several gripes with NumPy, or more broadly the notion of using Python to call a C&#x2F;asm library that vectorizes math operations. A lot of people speak of NumPy like the solution to all your high-performance math needs in Python, which I think is disingenuous. The more custom logic you do, the less suitable the tools get. Pure Python numeric code is incredibly slow - like 1&#x2F;30× compared to Java - and as you find parts that can&#x27;t be vectorized, you have to drop back down to pure Python.<p>I would like to give the simple example of the sieve of Eratosthenes:<p><pre><code> def sieve_primeness(limit): result = [False] + [True] * limit for i in range(2, len(result)): if result[i]: for j in range(i * i, len(result), i): result[j] = False </code></pre> This code is scalar, and porting it to a language like C&#x2F;C++&#x2F;Rust&#x2F;Java gives decent performance straight out of the box. Performance in Python is about 1&#x2F;30× the speed, which is not acceptable.<p>People pretend that you can hand-wave the performance problem away by using NumPy. Please tell me how to vectorize this Python code. Go on, I&#x27;m waiting.<p>You can&#x27;t vectorize the `if result[i]` because that controls whether the inner for-loop executes; so it must execute in pure Python. For the inner loop, you can vectorize that by creating a huge temporary array and then AND it with the result array, but that is extremely memory-inefficient compared to flipping bits of the result array in place, and probably messes up the cache too.<p>Alternatively, you can run the code in PyPy, but that usually gives a speed-up of maybe 3×, which is nowhere near enough to recover the 30× speed penalty.<p>Another example is that NumPy is not going to help you implement your own bigint library, because that also requires a lot of custom logic that executes between loop iterations. Meanwhile, I&#x27;ve implemented bigint in pure Java with acceptable performance and without issues.<p>Again, my point is that anytime you venture into custom numerical&#x2F;logical code that is not the simple `C = A + B`, you enter a world of pain when it comes to Python performance or vectorization.
评论 #44006443 未加载
评论 #43998476 未加载
评论 #43998289 未加载
评论 #43998235 未加载
评论 #43998291 未加载
评论 #43998312 未加载
评论 #43998592 未加载
aanet大约 18 小时前
I feel heard... I&#x27;m not the only one finding numpy confusing, esp with more than 3 dimensions.
nuc1e0n1 天前
I think Numpy is representative of the Python ecosystem as a whole. Powerful, but internally complex, bloated, poorly designed and poorly documented.
the_clarence1 天前
Why do people use numpy instead of sage?
eurekin1 天前
That&#x27;s one area LLMs hugely helped. You can just ask it and ask to generate tests to verify
constantcrying大约 19 小时前
One of the biggest hurdles of numpy is python, which is very clearly not a language designed for numerical analysis.<p>Comparing it to Julia and Matlab you can see many, many cases where the python design just does not fit.
coolcase1 天前
Sounds like hammers and nails problem. Your nail gun is called PyTorch.
josefrichter1 天前
You could try Elixir’s Nx and related ecosystem.
kccqzy1 天前
I have an unpopular opinion: I don&#x27;t like numpy.einsum because it is too different from the rest of numpy. You label your axes with letters but none of the other regular numpy functions do that. I usually avoid using numpy.einsum in favor of using a combination of indexing notation with numpy.newaxis (None), broadcasting, and numpy.swapaxes.<p>And I learned from someone more senior than me that you should instead label your variables with single-letter axes names. This way, the reader reads regular non-einsum operations and they still have the axes information in their mental model. And when you use numpy.einsum these axes labeling information become duplicated.
srean1 天前
I for one like Numpy&#x27;s broadcast far better than Matlab&#x27;s. In Numpy it&#x27;s a logical operation, no unnecessary memory&#x2F;copying cost.<p>Last time I checked Matlab (that was surely decade ago) it actually filled memory with copied data.
credit_guy1 天前
Here&#x27;s how you do it: focus on the simple case, solve it, then ask Copilot to vectorize the code. Life is too short.
评论 #44004447 未加载
adipandas1 天前
I completely agree with you mate.
sunrunner1 天前
&quot;Young man, in numpy you don&#x27;t understand things. You just get used to them.&quot; -- John von Neumann (probably)
评论 #43998346 未加载
the__alchemist1 天前
I&#x27;m with the author here. Great for simple use cases. Anything beyond that, and I find the syntax inscrutable. It&#x27;s like using a terse, feature-rich DSL like Regex. Good luck deciphering someone else&#x27;s numpy code (or your own from a month back) unless you really take the time to commit it to memory.
betelgeuse61 天前
Something wrong with that font.