首页

16 条评论

BiteCode_dev8 个月前
One of the reasons Python is so popular as a scripting language in science and ML is that it has a very good story for installing Frankenstein code bases made of assembly, C and Pascal sprinkled with SIMD.<p>I was here before Anaconda popularized the idea of binary packages for Python and inspired wheels to replace eggs, and I don&#x27;t want to go back to having to compile that nightmare on my machine.<p>People that have that kind of idea are likely capable of running kubs containers, understand vectorization and can code a monad on the top of their head.<p>Half of the Python coders are struggling to use their terminal. You have Windows dev that lives in Visual Studio, teachers of high school that barely show a few functions, mathematicians that are replacing R&#x2F;Matlab, biologists forced to script something to write a paper, frontend dev that just learned JS is not the only language, geographers that are supplicating their GIS system to do something it&#x27;s not made for, kids messing with their dad laptop, and probably a dog somewhere.<p>Binary wheels are a gift from the Gods.
评论 #41635853 未加载
评论 #41636220 未加载
评论 #41639733 未加载
评论 #41635980 未加载
评论 #41640807 未加载
yaleman8 个月前
The fact that tensorflow takes up 12.9TiB is truly horrifying, and most of that because they use pypi&#x27;s storage as a dumping ground for their pre-release packages. What a nightmare they&#x27;ve put on other people&#x27;s shoulders.
评论 #41637662 未加载
评论 #41636047 未加载
评论 #41636121 未加载
choeger8 个月前
The analysis contains an error. Binary artifacts don&#x27;t cause <i>exponential</i> growth in storage requirements. It&#x27;s still just linear. That&#x27;s also quite clearly seen when after a phase of exponential growth, the binary artifacts still only account for 75%.<p>So this whole strategy (actually a pitch for zigbuild) would ideally reduce the storage requirements to 25% - which would buy the whole system maybe a year or two if the growth continues.<p>Of course, it&#x27;s a good idea to build client-side. Especially considering the security implications. But it won&#x27;t fundamentally change the problem.
评论 #41636475 未加载
评论 #41636032 未加载
pxc8 个月前
There are already lots of passable package managers that know how to provide working binaries for the native, non-Python dependencies of Python packages. Instead of trying to make Python packages&#x27; build processes learn how to build everything else in the world, one thing Python packages could do is just record their external dependencies in a useful way. Then package managers that are actually already designed for and committed to working with multiple programming language ecosystems could handle the rest.<p>This is something that could be used by Nix, Guix, Spack, as well as more conventional software distributions like Conda, Pkgsrc, MacPorts, Homebrew, etc. With the former, users could even set up per-project environments that contain those external dependencies, like virtualenvs but much more general. But the simple feature of this metadata would naturally be valuable, if provided well, to maintainers of all Linux distros and many other software distributions, where autogenerated packages are already the norm for languagea like Rust and Go, while creating such tooling for Python is riddled with thorny problems— o these two proposals are not mutually exclusive, and perhaps each is individually warranted on its own.<p>Enriching package metadata in this simple way has already been proposed here:<p>htps:&#x2F;&#x2F;peps.python.org&#x2F;pep-0725&#x2F;
trlampert8 个月前
&quot;When Python came into existence, repeatable builds (i.e. not yet reproducible, but at least correctly functioning on more than one machine) were a pipe dream. Building C&#x2F;C++ projects reliably has been an intractable problem for a long time, but that&#x27;s not true anymore.&quot;<p>I&#x27;d dispute that. It used to be the case that building NumPy just worked, now there are Cython&#x2F;meson and a whole lot of other dependency issues and the build fails.<p>&quot;At the Zig Software Foundation we look up to the Python Software Foundation as a great example of a fellow 501(c)(3) non-profit organization that was able to grow an incredibly vibrant community ...&quot;<p>Better don&#x27;t meet your heroes. Python was a reasonable community made up and driven by creative individuals. In the last 8 years, it has been taken over by corporate bureaucrats who take credit for the remnants of that community and who will destroy it. The PSF has never done anything except for selling expensive conference tickets and taking care of its own.
rightbyte8 个月前
These package respositories are used in a wasteful way. Probably by thousands of CI servers spinning up blank slate docker containers etc.
评论 #41635348 未加载
评论 #41637310 未加载
faustin8 个月前
conda-forge handles the first part of this (reproducible builds) for most common platforms. The idea of rebuilding deleted artifacts on demand sounds nice in theory, but it has the complication that rebuilding something that depends on several other somethings will likely trigger a build cascade where a bunch of stuff has to get built in order. Hopefully none of those ancient build scripts require external resources hosted at dead links!
评论 #41635595 未加载
评论 #41633219 未加载
zamlag8 个月前
At work we don&#x27;t use PyPI any longer. We have our own set of curated packages, the security issues are just too great:<p><a href="https:&#x2F;&#x2F;developers.slashdot.org&#x2F;story&#x2F;24&#x2F;09&#x2F;15&#x2F;0030229&#x2F;fake-python-coding-tests-installed-malicious-software-packages-from-north-korea" rel="nofollow">https:&#x2F;&#x2F;developers.slashdot.org&#x2F;story&#x2F;24&#x2F;09&#x2F;15&#x2F;0030229&#x2F;fake-...</a><p><a href="https:&#x2F;&#x2F;jfrog.com&#x2F;blog&#x2F;revival-hijack-pypi-hijack-technique-exploited-22k-packages-at-risk&#x2F;" rel="nofollow">https:&#x2F;&#x2F;jfrog.com&#x2F;blog&#x2F;revival-hijack-pypi-hijack-technique-...</a><p><a href="https:&#x2F;&#x2F;jfrog.com&#x2F;blog&#x2F;leaked-pypi-secret-token-revealed-in-binary-preventing-suppy-chain-attack&#x2F;" rel="nofollow">https:&#x2F;&#x2F;jfrog.com&#x2F;blog&#x2F;leaked-pypi-secret-token-revealed-in-...</a><p>We consider switching to Java, C++ or Rust because of general quality issues with Python.
评论 #41635766 未加载
eesmith8 个月前
Some issues I see are:<p>- packages which only distribute binaries (eg, closed-source or source-for-a-fee distributions)<p>- it looks like Zig&#x27;s C compiler does not support OpenMP, which I use<p>- what is the cut-off time for source vs. binary distribution? My package takes about a minute to compile (it has a lot of auto-generated source).<p>- what&#x27;s the user impact if they have 10 projects which are just under that threshold?<p>- compile-time dependencies which are not recorded in pyproject.toml (like, having a Fortran compiler, having yacc&#x2F;bison, etc.)
Wowfunhappy8 个月前
I&#x27;m a bit confused as to why this costs so much. I thought storage was cheap?<p>Bandwidth is more expensive, but shouldn&#x27;t be relevant to this problem. It doesn&#x27;t matter whether 5 people request the same binary or 5 people request 5 different binaries for different platforms, if all the binaries are 1 gb you&#x27;re transferring 5 gb of data either way.
评论 #41636509 未加载
评论 #41635917 未加载
rwmj8 个月前
Just use the system packages! Fedora, Debian, AUR, brew&#x2F;macports on macOS, etc are all a thing, use them.
评论 #41635756 未加载
评论 #41640439 未加载
评论 #41635813 未加载
评论 #41636137 未加载
评论 #41635880 未加载
评论 #41636110 未加载
评论 #41635734 未加载
评论 #41636280 未加载
评论 #41635741 未加载
Havoc8 个月前
12TB for tensorflow is absurd.
atemerev8 个月前
Nope. Wheels are the only thing that makes Python&#x2F;PyPI usable. I don’t want to wait tens of minutes to recompile pytorch, or something (and conda is way too heavyweight for my tastes)
JohnMakin8 个月前
I don’t pretend to know the answer here, but python is unavoidable in my work and the packaging is a constant source of irritation for me. Disclaimer: I am not pretending to be a python expert here but coming from a C background this anecdote is baffling to me.<p>I am writing some lambda OAuth glue logic using python because python is the best choice for this particular implementation. I need to package “jwt” library which worked absolutely fine for a while. Then I upgrade python versions and the particular pre-built container I was using to run pip and zip the packages up ended up with totally borked versions of jwt which seemed to not have functions I was previously using fine.<p>Dig in, finally figure out that my older version was <i>actually</i> importing PyJWT even though I had specified “jwt.” The new container was breaking because it was actually installing a different “jwt” library. So, the solution was to specifically specify PyJWT and which version I wanted in my pip install. Great! That’s how I think it <i>should</i> work and I was a little baffled that pip had made that decision for me previously.<p>Anyway, it now has my missing functions but is still crashing in my blue deployments. Wtf? Oh, this PyJWT import is missing an algorithm. To fix that, I <i>also</i> need to pip install “cryptography” (making sure to get the compatible version matrix here, I at this point had stopped trusting pip).<p>So maybe this is all very obvious and “duh” to veterans out there but this was impossibly silly and wasted a stupid amount of time for something that should be dead simple. yea, wrangling makefiles and fussing with linkers can be annoying but I’d take that any day over this bs.
评论 #41640701 未加载
zzzeek8 个月前
My own personal TL;DR would be, pypi has to store too much data in the form of pre built binaries that are uploaded by package authors. Python should adopt a repeatable build format so that PyPi itself can build wheels for any platform on demand (edit: am I misunderstanding? did they mean, wheels can be built as part of the local install process?). Author is involved with some special compiler to do this.<p>Personally I&#x27;d love if PyPi could build our wheels for us. That would be great, we use GitHub actions right now which has its own complexities and for years we had nothing. But that would mean a huge ramp up in processing capability for PyPi. Considering PyPi can&#x27;t even handle having its packages signed and &quot;solved&quot; the problem by sending authors obnoxious emails if we even dare to push up a signature file, im not too optimistic for such a change vs. their just continuing to rely on corporate sponsors to deliver bandwidth.
评论 #41637511 未加载
Siecje8 个月前
Compatible releases should replace older versions.<p>Why would you want to install an old version of ruff?
评论 #41641084 未加载
评论 #41640746 未加载