Ask HN: How did Python become the lingua franca of ML/AI?

78 pointsby heyzkover 3 years ago

I've looked around a bit and can't really find a satisfying answer to this question. There are posts on answer sites, but these often boil down to "dynamic languages are good at glue", or "Tensorflow / Jupyter".That can't be the whole story, can it? Or if it is, why did these projects choose Python over other scripting lanuages?I bet there's some interesting history here.

29 comments

cameldrvover 3 years ago

NumPy, SciPy and the ecosystem around them. So much of what you do in ML involves matrix operations. People used to do this stuff in Matlab. Matlab is good at numerics but it's not a very good programming language, and doesn't have very good libraries outside of the numeric domain. The open source nature of NumPy and Python encouraged a big open source community that is hard to get going if you're building open source on top of a language that costs thousands of dollars per seat.Python's dynamic nature also made a lot of what's in NumPy and the various ML libraries possible or more convenient to use. The performance is not as much of an issue if you start thinking in NumPy terms, doing operations on whole arrays where the loops are then in C. Really, Python itself is just acting as orchestration for a bunch of C code that's doing all the work. In the case of something like Tensorflow or PyTorch, it's actually a bunch of CUDA code that's doing all the work and orchestrated by Python.

评论 #29176941 未加载

MattGaiserover 3 years ago

My last job was at an ML company.Most ML people there cannot build large robust systems and some struggled with the non-algorithmic bits of software. I am sure that some can out there in the world, but for the most part our ML people were very good at creating models and not very good at the development part, especially as the program grew (part of the motivation to hire devs like me in the first place).Python gets rid of as much of the developmental complexity as possible. No types, no memory management, libraries for everything, No need to create a class to run "hello world." Pip makes it trivial to import things. Use PyCharm and you just need to click the run button, with no complicated JRE and JDK setup.It is the fastest way to start writing models.

评论 #29171781 未加载

评论 #29171590 未加载

评论 #29176112 未加载

评论 #29175265 未加载

评论 #29176463 未加载

auntienomenover 3 years ago

Python's a lingua franca in AI/NN because it was already a dominant language in scientific computing. Its dominance in scientific computing grew steadily through the 1990s and 2000s, for a few reasons:1) Python -- specifically CPython -- made it easy to wrap existing, thoroughly tested high performance libraries in Python APIs. So, you got easy access to things like GSL and BLAS and LAPACK, but you get to call numpy.linalg.svd instead of GESDD.2) Python was a general purpose language, unlike R or MATLAB, so you could extend existing systems to do more without running into a wall.3) Python was a heck of a lot less effort to use than C++.

评论 #29171848 未加载

评论 #29174837 未加载

评论 #29175004 未加载

评论 #29171817 未加载

alanfranzover 3 years ago

I don't think there's a master plan or a design idea about that.There's an old saying that goes like "Python is the second best language for anything".Python isn't the best for any kind of task; but you can do almost anything in any field with python and some libraries. It's reasonably easy for a non-programmer to use it.I think my first experience with GPU programming was using CUDA with C (I think it was kind of customized C in mid-2000s), so Python is not there since forever.But if you need to do a bit of web scraping/input data manipulation, a bit of "offering a gui" (e.g. a small web server that shows the data), a bit a of matrix/vectorized operations, a bit of model training or even just inference... python has everything and everything is reasonably good. At least some of those operations would be cumbersome in other programming languages.Try using R for general-purpose programming. Or Java for number crunching/matrix operations. They just suck.Try finding the "greatest common divisor", functionality-wise, for the many tasks that you need in a ML system (just as many other systems), and you'll find Python.The drawback is, IMHO, that it doesn't "scale" well. Python makes great proof of concepts and prototypes, but I'll always pick a different stack (possibly with multiple languages and technologies) if I want a long-running, maintainable production system.

bearly_legalover 3 years ago

The same reason that Python is heavily used in scientific computing.ML/AI/Scientists aren't systems people. They don't want to care about memory management/parallelization/etc. - they want to write perfect little mathematical poems which get executed on a perfect Turing machine.Python is good at that. Thanks to the efforts of actual systems people, its libraries (numpy, scipy, etc.) run quick enough to be practical on a lot of workloads.

bbulkowover 3 years ago

Another way to analyze the problem: what other language would it have been, given the moment ml hit?You say compared to other scripting languages'. Let's list them.Ruby: no numeric support Go: unnecessary typing, modest numeric support, shitty generics Bash: ha ha ha Scala, java, c, cpp: not a scripting language, complex Tcl, php: out of favor Rust: hadn't happened yet R: in memory bias, not as simple Other languages were obscure or owned by monoliths (kotlin, swift, c#)Python also has multiple implementations, a minor thing, but not really. Pypy keeps cython on its toes.C# really could be a contender. I am more productive in c# than any other language except python (although I think I will be more productive in rust)Python is, almost unarguably, the easiest language to code in, right now, period. It has the greatest expressiveness and the simplest syntax. I use it for large scale open source art projects, and you can use it for ai.Why are you asking?

评论 #29171814 未加载

评论 #29171742 未加载

评论 #29171767 未加载

评论 #29173604 未加载

lokimedesover 3 years ago

I can only provide anecdotal material, but back in 2007-2014 when I was a particle physics researcher, we saw a high uptake of python for steering data analysis jobs. The actual calculations were done in C++. Gradually over the years, as more students joined the LHC conquest, our tools evolved to allow more of the analyses to be directly programmed in Python. R was never a thing among the 10000+ physicists in our community. These people have since then drifted around the world working on Big Data, ML and recently Data Science. It’s hard to keep count, but I routinely recognize fellow particle physicists at various ML companies.For the curious, our primary hammer was “ROOT” <a href="https://root.cern" rel="nofollow">https://root.cern</a> - note its well-evolved ability to connect Python and C++ code.

sbashyalover 3 years ago

I have a historical perspective on this topic. Data Science popularity was rapidly growing and "R" was the lingua franca around 2010 - 2014.I attended Strata conference in 2014 and after visiting various technology exhibition booths there, I saw a common theme: tech companies were building data solutions using Python as R was no good for the purposeIn a meeting scheduled to share my take-aways from the conference, I predicted "Python will emerge to be the language of DataScience in few years"

lordnachoover 3 years ago

Numpy, Scipy pandas, etc are a way to use scripty syntax to write CPP code.Under the hood you get the benefits of CPP: stuff is dense in cache, operations are efficient.But you can write it without a bunch of types, templates and allocators, which confuse people who aren't used to it. Most numeric code doesn't have a load of types anyway, it's just a few operations on some very large matrices.Add to that the benefit that you can just ask of python's universe of libraries, which is quite large compared to rivals like MATLAB or R. Want to serve your model as a website? Jam it into Flask. Need crypto lib to grab the data? No problem, just pip it and import.

holonomicallyover 3 years ago

Python has always been used as a nice layer over various C libraries so when ML started taking off and people started using GPUs to accelerate training and inference it was a natural choice for acting as the high level code for interfacing with the low level GPU code.There were some other DSLs that were being developed at the time but the ones that stuck were the Python ones. [1]1: <a href="https://terralang.org/" rel="nofollow">https://terralang.org/</a>

ThePhysicistover 3 years ago

Pythons' USP was and still is its ability to provide a simple & intuitive "glue layer" over lower-level libraries. Most of the performance-critical functionality that Python relies on for ML is written in C/C++/Fortran and Python mostly provides the UI layer (this is an oversimplification of course).Wrapper generators and compiler tools like Cython and before that SWIG made it very easy to glue existing functionality to Python, so together with Pythons' great usability and user-friendly language it created a killer combination for productive data science & ML.That said other languages could've pulled this off as well, Ruby for example. Python had more early traction in the scientific and high-performance computing communities though whereas Ruby was more popular in web development (due to Rails), which ultimately gave Python the edge and attracted more and more toolmakers to its ecosystem, which in turn spurred further growth. Great "IDEs" like the iPython/Juypter notebook were also a key factor in Pythons' success, as they provided a super user-friendly UI for data scientists.

chubotover 3 years ago

Because Python has NumPy, which implements vectorized math on arrays and matrices. Machine learning algorithms are implemented naturally and efficiently with those primitives. PyTorch, TensorFlow, and I think every other machine learning framework in Python all use NumPy.JavaScript, Ruby, and Perl either don't have this abstraction at all, or they have much weaker versions of it, and many fewer scientific libraries.NumPy started in the early 2000's and continues to this day. It takes decades to build up this infrastructure! This recent interview with NumPy creator Travis Oliphant is great:<a href="https://www.youtube.com/watch?v=gFEE3w7F0ww" rel="nofollow">https://www.youtube.com/watch?v=gFEE3w7F0ww</a>He talks about how there were competing abstractions like "Numeric" and another library, and his goal with NumPy was to unify them. And how there are still some open design issues / regrets.There were multiple people in the nascent Python community who were tired of MATLAB, not just because it's proprietary, but because it's a weak and inefficient language for anything other than its scientific use cases. You won't have a good time trying to write a web app wrapper in MATLAB, for example.The much more recent Julia language is also inspired positively and negatively by MATLAB, and is very suitable for machine learning, though it doesn't have the decades of libraries that Python has.-----The NumPy extension was in turn enabled by operator overloading in Python (which is actually a very C++ influenced mechanism). JavaScript doesn't have operator overloading; I'm pretty sure Perl doesn't, but not sure about Ruby. Lua and Tcl do not have it. (Lua does have a machine learning framework though -- <a href="http://torch.ch/" rel="nofollow">http://torch.ch/</a> -- but I think PyTorch is more popular now.)So if Guido didn't design Python with operator overloading, then NumPy would not have grown out of it.Also relevant is Guy Steele's famous talk Growing a Language (late 90's or early 2000's I think). He advocates for operator overloading in Java so end users can evolve language with their domain expertise! Well Java never got it, and Python ended up having the capabilities to grow linear algebra.Guido has even said he doesn't really use or even "get" NumPy! So it turns out that an extensible design does have the benefits that Steele suggested (although it's a very difficult language design problem.) There have been several enhancements to Python driven by the NumPy community, like slicing syntax and semantics and the @ matrix multiplication operator. And I think many parts of the C API like buffers.-----Another interesting thing from Oliphant's interview is that he really liked that Python has complex numbers. (I don't think any of JavaScript, Ruby, Perl, or Lua have them in the core, which is important.) That piqued his interest and kicked off a few decades of hacking on Python.He was an electrical engineering Ph.D. student and professor, and complex numbers are ubiquitous in that domain. Example:<pre><code> $ python3 -c 'print(3j * 2 + 1)' (1+6j) </code></pre> This is another simple type built on Python's extensible core, and it's short.<pre><code> $ Python-3.9.4$ wc -l Objects/complexobject.c 1125 Objects/complexobject.c </code></pre> I recommend writing a Python extension in C if you want to see how it works. See Modules/xx*.c in the Python source code for some templates / examples. IMO the Python source code is a lot more approachable than Perl, Ruby, or any JS engine I've looked at.

评论 #29171702 未加载

评论 #29179374 未加载

评论 #29171725 未加载

z3phyrover 3 years ago

At the time when ML craze hit, python was already very popular as a beginners language and had good numeric libraries.Many people of STEM fields without any programming background, had their first taste of programming with python. And it caught on.Also, the real stuff is probably written in C/C++/CUDA/ASM. Its only the interface that is python (because of its inertial popularity)

powersnailover 3 years ago

The use of python in most ML/AI research is not even a "glue" language.It is used as a shell. It's merely an interface to some gigantic, highly optimized libraries (numpy, scipy, and later, Tensorflow, Pytorch, etc.), and it does a very decent job at being an interface.- The language is easy to grasp, at least the part that is used in data science and ML;- The syntax is "familiar", as compared with R;- There are many more general purpose libraries in Python than in R;- There's no memory management problems;- The standard library is packed with batteries;- No compiling, which is important for being a shell;- It's better than bash etc. at dealing with non-text data, especially numerical values;- The community was already writing extensions in C;Some other language could work well, too, had someone written a numpy for it at the time. But there really aren't that many people who are capable, interested, and invested enough to write such a marvelous library.

habiburover 3 years ago

This happened when MIT switched from Scheme to Python some time in the '00s. Python's adaptation increased in the scientific community further and here we are.

评论 #29171930 未加载

oiveyover 3 years ago

Python had basically already won numerical computing thanks to NumPy, SciPy, and Matplotlib before ML really blew up. The other two serious contenders were R and Matlab. Python is a much better general purpose language than either of those, and Matlab is proprietary.

rg111over 3 years ago

1. It's extremely easy. Before the so-called revolution and CS people trying to get into it in droves, it was a niche topic dominated by the lifelong-researcher types. They could not be bothered with complex code. Writing code should not get in the way. Then, Lisp dominated the ML/AI scene. Now Python does, for this reason, to some extent. Python being easy is also helpful for non-CS engineering and other science grads to learn quickly.2. Python has a huge ecosystem. NumPy, SciPy, and now Tensorflow, PyTorch, JAX. These makes lives easier.3. Python and its ecosystem is FOSS. Students, hobbyists can learn it for free. (Quick anecdote: my uni in India, a very reputed non-IIT one, with sub-optimal funding, two years ago switched to Python + ecosystem for Physics and CS courses- both major and minor. This switch happened directly from C. Before that, Fortran was used. MATLAB, SPSS, etc. was never an option for cash-starved Indian unis. This is pretty much the same all across India. And thus you get a huge talent pool already trained in Python that pass-outs from hard-to-get-into unis.)4. Python being general purpose also helps vis-a-vis R. R is heavily constrained. You cannot do much in it. R is used in anaalysis and Data Science. I have never seen it being used in ML, DL or RL. You learn Python, you can do non-trivial file manipulation in it. Good luck doing that with R or MATLAB.5. The amount of people who needs to write code that reaches the metal is very small. I never needed to look under the sheets. I spend my life writing PyTorch, fastai, and TFLite. A friend of mine doing PhD needed to write custom CUDA code and then a wrapper so that it could be accessed from Python. He said that it was a very horrible experience. But the number of such people too little to bring Julia to mainstream. Julia removes the "two-language problem", but most people never need to use anything besides Python.

snicker7over 3 years ago

It comes down to timing, really. Just like most technology fads.Python is interpreted/dynamic, open source, general-purpose, relatively popular, and is easy to write low-overhead wrappers to C/C++/Fortran libraries. In 2008-2010, when ML took off, Python was the only language with these properties.Python, however, has its problems: an atrocious concurrency story (GIL, colored coroutines, asyncio vs trio rift, fork is inefficient thanks to GC, etc), highly non-compostable (especially in scientific computing), and package management is broken. The language is also inherently slow (unfixable). It will be replaced by something else eventually.

threeseedover 3 years ago

I would put it mostly down to Spark.Originally, it was only available in Scala/Java but then they added Python support courtesy of Py4J. And since Python was massively simpler than Scala it exploded in popularity very quickly becoming the default language.So then you had Data Scientists who were already writing a lot of data transformations in Spark looking around at the rest of the Python ecosystem finding libraries like pandas, IDEs like Jupyter and basically staying there since it was so much easier than alternatives.Their interests aren't really in computer science and so they look for whatever language can get them to an outcome as quickly and easily as possible. Even if it's not the most optimal, elegant or maintainable.

评论 #29171925 未加载

评论 #29171812 未加载

评论 #29172018 未加载

blunteover 3 years ago

Because a lot of the early development in these areas was done by mathematicians and physicists who weren’t programmers (and who had less exposure to languages). These are folks who just wanted an answer to a question or a premise, and the elegance of the path that took them to the answer was utterly insignificant.In some cases you might see a 3000 line python script with no defined functions… just loops and conditionals and lots of copy-pasted code with small variations in each section.It’s really a shame, since there are so many more elegant languages which are equally or more powerful. But python is not a terrible language… it’s just an everyman get-shit-done language. We could be worse off.

nijaveover 3 years ago

I suspect some of the popularity came from IT and engineering departments preferring to run Python compared to, say, Matlab, Excel, or some other GUI based application not designed to run on servers. R also has a lot of adoption but I think Python is a little more natural to run since so many infrastructure tools are written in PythonIt also works fairly well cross platform. That means you can develop on Windows and run on Linux without too many issues (at least for ML/AI stuff, common frameworks usually have per platform binaries published)i.e. Python is familiar to the people supporting production systems

ksecover 3 years ago

>I bet there's some interesting history here.ML / AI were derived from Data Science. So hence it was built up upon the same python foundation.As to why Python on Data Science. One needs to be reminded most people doing Data Science, or any Matlab type of work do not considered themselves as programmers. They dont want to learn about 20 reason why functional programming, or objected oriented programming are better and 100s other best practice with 1000 tricks to write the same program.Although I do wonder if Julia may have a chance to dethrone it in the next 10 years.

rcarmoover 3 years ago

I saw Fortran being wrapped in it because it could be compiled down into libraries with C-compatible ABIs, and Python can load C libraries directly without fuss.A lot of NumPy and SciPy ensued, and the rest is history.

xvedejasover 3 years ago

My narrow perspective is that Python is the one language that both companies engaged in ML research (like Google) have been using, and also a very common language of instruction, for instance in the CS program at the school where I studied. If you were a CS student interested in ML/AI, starting school ten to fifteen years ago, but not particularly interested in software engineering, you'd be able to get by with only really knowing Python. Depending on how widespread this is, I'm guessing it's a part of the picture.

jandrewrogersover 3 years ago

Python became the de facto glue language for supercomputing a very long time ago because you could easily bind C code into it. If you needed linear algebra etc to run on a massive supercomputer, there were highly optimized Python libraries for that so the researcher didn’t have to write C/C++/Fortran. This massively improved iteration times for a lot of scientific computing efforts with only a modest loss of performance. By the time data science/ML/AI/etc became a thing these tools were already very mature and also relevant.The tl;dr: Python had the advantage of a mature legacy in supercomputing doing many of the same types of computations done in AI/ML. Those libraries and bindings provided a massive leg up versus other scripting languages that did not have this kind of capability effectively built-in.

peter_retiefover 3 years ago

Python is the default in many fields beside ML, however ML has its own very efficient languages, I recently discovered octave through a course I attended. Really worth having a look at <a href="https://www.gnu.org/software/octave/index" rel="nofollow">https://www.gnu.org/software/octave/index</a> is mostly compatible with matlab.

morelandjsover 3 years ago

I think there was a strong cohort of scientists using matlab, and python and numpy adopt the same language conventions. Going from matlab to python is effortless.Moreover, scientists are typically so-so programmers so not having to worry about complexities like dereferencing pointers, specifying types etc, makes the language much easier to pick up.

marto1over 3 years ago

At least in academic circles python has always looked like pseudo code for C(or similar) you can execute so everyone has "used it" at some point or another to describe algorithms and stuff. Then the same academic circles do a lot of ML/AI research so python had a natural advantage.

karmasimidaover 3 years ago

Python was already the lingua franca before the whole deep learning/AI thing. It has numpy/scipy/pandas/scikit-learn, etc. And when did numpy happen? It was in 1996.Arguably its biggest competitor then was R, but R is not well accepted by programmers. Yet another alternative is Matlab, but OMG, using matlab for anything string related is killing me.While there is some history to it, Python won in the end isn't a surprise to anyone. It is simple but not toyish for real world system. I am working in one of the big techs, and Python is running the production workload for most AI services just fine.I took a LOT of issue with dynamic typing, but for ML/AL you are going to write a lot ad-hoc data wrangling code, sometimes even Python feels verbose.TL;DR: It had already won.

评论 #29171774 未加载