TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Why Julia's DataFrames Are Still Slow

60 点作者 johnmyleswhite超过 9 年前

4 条评论

jpfr超过 9 年前
Relevant video from juliacon on the type system core: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=xUP3cSKb8sI" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=xUP3cSKb8sI</a><p>Julia&#x27;s type system is geared towards JIT compilation. Methods are compiled when they are first called. Then, of course, full type information for the arguments is available. That&#x27;s quite enough for Matlab-style code with the occasional JITted method. But Julia has one glaring disadvantage for everything beyond that: The return type of methods cannot be specified &#x2F; enforced.<p>1) With &quot;black box&quot; methods (where the return type cannot be inferred as in this DataFrames article) the code becomes horribly slow. And you have to dig into the internal method representations to become aware of the type inference results.<p>2) It hurts the ability of Julia to produce binary executables. When the types are not 100% inferrable, the entire JIT infrastructure needs to be dragged along.<p>3) Types are not only an aid for the compiler, but also an aid for the programmer. With SIUnits [1] and method return types, Julia could even tell when the physics represented in the code is flawed!!<p>If Julia&#x27;s type system were stronger, it could become a prime platform to develop Computer Algebra Systems (CAS). That could lead to a great unification of symbolic and numerical &quot;computation platforms&quot;. However, current Julia is unable to represent the mathematics encoded in the type system of open source CAS like Axiom [2]. Also note the github issue on Julia and dependent typing [3].<p>Imho, there is still a great potential for the Julia type system without breaking existing code.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;Keno&#x2F;SIUnits.jl" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Keno&#x2F;SIUnits.jl</a><p>[2] <a href="http:&#x2F;&#x2F;citeseerx.ist.psu.edu&#x2F;viewdoc&#x2F;download?doi=10.1.1.27.2331&amp;rep=rep1&amp;type=pdf" rel="nofollow">http:&#x2F;&#x2F;citeseerx.ist.psu.edu&#x2F;viewdoc&#x2F;download?doi=10.1.1.27....</a><p>[3] <a href="https:&#x2F;&#x2F;github.com&#x2F;JuliaLang&#x2F;julia&#x2F;issues&#x2F;6113" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;JuliaLang&#x2F;julia&#x2F;issues&#x2F;6113</a>
评论 #10645845 未加载
评论 #10645652 未加载
sevensor超过 9 年前
Is there no way to take advantage of the fact that most columns, most of the time, are filled with doubles? This is both the expected case and the thing we want to go faster. I don&#x27;t know compiler design, which is why I ask.
评论 #10646014 未加载
IndianAstronaut超过 9 年前
One thing I would really like to see happen is out of core data and statistics in Julia just like SAS. Not possible in either R or Python.
评论 #10644708 未加载
评论 #10644817 未加载
jbssm超过 9 年前
Is there any good alternative library to DataFrames?