TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Why Julia's DataFrames Are Still Slow

60 pointsby johnmyleswhiteover 9 years ago

4 comments

jpfrover 9 years ago
Relevant video from juliacon on the type system core: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=xUP3cSKb8sI" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=xUP3cSKb8sI</a><p>Julia&#x27;s type system is geared towards JIT compilation. Methods are compiled when they are first called. Then, of course, full type information for the arguments is available. That&#x27;s quite enough for Matlab-style code with the occasional JITted method. But Julia has one glaring disadvantage for everything beyond that: The return type of methods cannot be specified &#x2F; enforced.<p>1) With &quot;black box&quot; methods (where the return type cannot be inferred as in this DataFrames article) the code becomes horribly slow. And you have to dig into the internal method representations to become aware of the type inference results.<p>2) It hurts the ability of Julia to produce binary executables. When the types are not 100% inferrable, the entire JIT infrastructure needs to be dragged along.<p>3) Types are not only an aid for the compiler, but also an aid for the programmer. With SIUnits [1] and method return types, Julia could even tell when the physics represented in the code is flawed!!<p>If Julia&#x27;s type system were stronger, it could become a prime platform to develop Computer Algebra Systems (CAS). That could lead to a great unification of symbolic and numerical &quot;computation platforms&quot;. However, current Julia is unable to represent the mathematics encoded in the type system of open source CAS like Axiom [2]. Also note the github issue on Julia and dependent typing [3].<p>Imho, there is still a great potential for the Julia type system without breaking existing code.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;Keno&#x2F;SIUnits.jl" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Keno&#x2F;SIUnits.jl</a><p>[2] <a href="http:&#x2F;&#x2F;citeseerx.ist.psu.edu&#x2F;viewdoc&#x2F;download?doi=10.1.1.27.2331&amp;rep=rep1&amp;type=pdf" rel="nofollow">http:&#x2F;&#x2F;citeseerx.ist.psu.edu&#x2F;viewdoc&#x2F;download?doi=10.1.1.27....</a><p>[3] <a href="https:&#x2F;&#x2F;github.com&#x2F;JuliaLang&#x2F;julia&#x2F;issues&#x2F;6113" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;JuliaLang&#x2F;julia&#x2F;issues&#x2F;6113</a>
评论 #10645845 未加载
评论 #10645652 未加载
sevensorover 9 years ago
Is there no way to take advantage of the fact that most columns, most of the time, are filled with doubles? This is both the expected case and the thing we want to go faster. I don&#x27;t know compiler design, which is why I ask.
评论 #10646014 未加载
IndianAstronautover 9 years ago
One thing I would really like to see happen is out of core data and statistics in Julia just like SAS. Not possible in either R or Python.
评论 #10644708 未加载
评论 #10644817 未加载
jbssmover 9 years ago
Is there any good alternative library to DataFrames?