TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How Mojo gets a speedup over Python – Part 2

94 pointsby CoreyFieldensover 1 year ago

17 comments

CoreyFieldensover 1 year ago
I&#x27;m really interested in Mojo not for its AI applications, but as an alternative to Julia for high performance computing. Like Julia, Mojo is also attempting to solve the two-language problem, but I like that Mojo is coming at it from a Python perspective rather than trying to create new syntax. For better or for worse, Python is absolutely dominating in the field of scientific computing, and I don&#x27;t see that changing anytime soon. Being able to write optimizations at a lower level in a Python-like syntax is really appealing to me.<p>Furthermore, while I love Julia the language, I&#x27;m disappointed in how it really hasn&#x27;t taken off in adoption by either academia or industry. The community is small and that becomes a real pain point when it comes to tooling. Using the debugger is an awful experience and the VSCode extension that is recommended way to write Julia is very hit-or-miss. I think it would really benefit from a lot more funding that doesn&#x27;t actually seem to be coming. It&#x27;s not a 1-to-1 comparison, but Modular has received 3 times the amount of funding as JuliaHub despite being much younger.
评论 #37319879 未加载
frakt0x90over 1 year ago
At least they included numpy in this one. On their last post, after all their optimizations, numpy.matmul() produced almost the exact same throughput as their most optimized example. Would still need to dig in to see if this one has issues. Benchmarks are always such a minefield.
评论 #37308617 未加载
zengidover 1 year ago
I&#x27;m pretty excited about Mojo and have been keeping an eye on it&#x27;s development. I feel like the team has learned a lot from their experience, and are taking the best from languages like Python, Rust, Swift, Hylo (Formerly known as Val), and are taking a really nice pragmatic approach in implementing them so that the language is <i>approachable</i>, but also very safe and fast. Once it&#x27;s out, I hope someone sits down and makes a SwiftUI-like cross platform UI library with it ;).
评论 #37309621 未加载
pjmlpover 1 year ago
Still waiting if all of this will be another Swift for Tensorflow, or actually make a difference.
评论 #37309676 未加载
leecarraherover 1 year ago
35Kx speedup is not scaled speedup. Throw this, naively parallelizable task at a bigger computer and get 70kx speedup, etc.<p>While i think there are tons of optimizations to be done for python (looking at you GIL) giving access to low level cpu primitives is not one I think that will be broadly adopted by the python community. That&#x27;s one of the joys of python: system agnostic, looks pretty close to pseudocode, coding. If you want speed, glue together a bunch of compiled code calls, and hope the call overhead isn&#x27;t too large. Or write cpu intensive operations in numba, or pyrex. At the end of the day, mojo&#x27;s pay to play programming language harkens back to the early 90&#x27;s Borland days.
评论 #37309274 未加载
评论 #37310130 未加载
erichoceanover 1 year ago
Mojo needs to demonstrate Hugging Face&#x27;s AI libraries with Mojo acceleration. Nothing else will have the kind of impact that would have.<p>Throw a half dozen engineers at it, develop a deployment plan for SD XL, profit.<p>You&#x27;ll get a ton of open source developers working on improving the Mojo versions even further once you release it, researchers developing extensions, etc. GO TO WHERE THE DEVELOPERS ARE.<p>Stable Diffusion is crazy compute heavy, so if Mojo is what it&#x27;s purported to be, it should be possible to get speedups.
thebigspacefuckover 1 year ago
They lost me with the emoji for file extension. That’s not a world I want to live in.
评论 #37321881 未加载
评论 #37309364 未加载
dandiepover 1 year ago
I don&#x27;t understand the play here for Modular. If this is a worthwhile improvement that is broadly applicable, won&#x27;t it at some point make it&#x27;s way into Python, numpy, etc?<p>In Java land we had a bunch of other JVMs over the years offering better performance. Most important things got absorbed into what is now OpenJDK, and the other JVMs, if they even exist at all, are niche players.<p>Performance is a huge focus in Python and ML lands right now, so why would this be any different?
评论 #37308590 未加载
评论 #37309977 未加载
评论 #37308605 未加载
评论 #37309106 未加载
dist-epochover 1 year ago
Cool, but it has very little to do with Python, except some similar looking syntax.<p>So for a Python programmer with a performance problem, it doesn&#x27;t look like a solution.
评论 #37309651 未加载
brrrrrmover 1 year ago
I just want to see real un-hyped benchmarks. Comparing random Python native code makes no sense and seems dishonest, deterring me from actually trying out the tool.<p>I want a Python that can statically plan underlying GPU allocations, avoids CUDA kernel dispatch overhead and enables a multi-GPU API that isn&#x27;t some multiprocessing abomination.
two_handfulsover 1 year ago
A Python with easy-to-use SIMD and multithreading sounds awesome!
spencerchubbover 1 year ago
Why is this a language superset of python rather than a python library? Genuinely asking and not trying to bash
评论 #37309352 未加载
queuebertover 1 year ago
As a high-performance computing person, I&#x27;m usually I&#x2F;O bound, not compute bound. I wish someone would come up with a 10x speed up for disk and network I&#x2F;O.
Keltesethover 1 year ago
So TL;DR: Using SIMD and multithreading is faster than doing no optimization in python. The only real comparison here is when not doing any optimization is:<p>&gt; The above code produced a 90x speedup over Python and a 15x speedup over NumPy as shown in the figure below:<p>Am I missing something?
评论 #37308005 未加载
评论 #37308219 未加载
评论 #37308059 未加载
pantsforbirdsover 1 year ago
Good blog post. I do wonder how it would do compare to an implementation of pycuda.
laweijfmvoover 1 year ago
nit: The text says 743x but the graph (Figure 3) shows 527x
deepsquirrelnetover 1 year ago
I don’t understand this from a goals perspective. What is an “AI compiler” - and why aren’t they comparing benchmarks with technologies more commonly used in AI?<p>I think I should be impressed, but I feel like I’m missing the point.
评论 #37308870 未加载