My Journey from R to Julia

102 pointsby michelpereiraover 2 years ago

16 comments

> For example, in R, we try to avoid loops because they are very inefficientThis was true before, but the performance of for loops has been improved a lot later years, and while vectorization is still faster, for loops are no longer a no-noSee <a href="https://www.r-bloggers.com/2022/02/avoid-loops-in-r-really/" rel="nofollow">https://www.r-bloggers.com/2022/02/avoid-loops-in-r-really/</a>

评论 #34567011 未加载

评论 #34567653 未加载

评论 #34567305 未加载

评论 #34568918 未加载

kkonceviciusover 2 years ago

R can handle the examples in the article with generic functions:<pre><code> oddsratio <- function(x, ...) UseMethod("oddsratio", x) oddsratio.integer <- function(a, b, c, d) (a * d) / (b * c) oddsratio.numeric <- function(p1, p0) ((p1)/(1 - p1)) / ((p0)/(1 - p0)) oddsratio.matrix <- function(x) (x[1, 1] * x[2, 2]) / (x[1, 2] * x[2, 1]) </code></pre> Then:<pre><code> oddsratio(12L, 6L, 2L, 29L) # 29 oddsratio(12/(12+2), 6/(6+29)) # 29 oddsratio(matrix(c(12,6,2,29), 2)) # 29</code></pre>

评论 #34568491 未加载

vitorsrover 2 years ago

This has been said before multiple times over but with these languages it is rarely about the languages themselves but their ecosystems:<a href="https://cran.r-project.org/web/packages/available_packages_by_name.html" rel="nofollow">https://cran.r-project.org/web/packages/available_packages_b...</a>To go from R to Julia, as an example, one would have to give up on a hundred or so high-quality packages potentially related to their activities.

评论 #34567786 未加载

评论 #34568298 未加载

评论 #34567287 未加载

sfpotterover 2 years ago

This is a pretty weak article. The author lists five reasons an epidemiologist would be interested in Julia and then only gives a (kind of simple and contrived) example for one of them.

评论 #34573419 未加载

评论 #34567540 未加载

评论 #34568877 未加载

bluedinoover 2 years ago

I work with PhD chemists at a F500 company, most everyone uses Python, we have a pocket of users that are on the R train. Mostly Rstudio mixed with Python.Someone just asked to install Julia on the compute cluster just last week so we'll see how many others start using it.

评论 #34568584 未加载

BrandonS113over 2 years ago

Just now I was thinking of moving a long calculation from R to Julia (non-linear optimisation of a simple function with multiple local minima, for a lot of different datasets). No loops. Embarrassingly parallel. And to my great surprise, R and Julia took the same time.

评论 #34569205 未加载

评论 #34567229 未加载

评论 #34567418 未加载

mxkopyover 2 years ago

From my understanding Julia is closer to metal than R. This means the semantics are much more specific than R, and the syntax is more consistent/rigid.For example, plotting in R always baffled me.plot(x, y, col=..., col.name=...)In this case, col.name is literally just a symbol. But in another context col.name is the data with index 'name' stored in col. Or something, it's been a while.R seems to have a lot of these 'special contexts' that A. make understanding and writing code much quicker and B. reward familiarity over intuition. One line in R can be 100 in Julia, and both compile to 80 machine instructions, for example.I'd say if you can agree with others on what R code does and you're comfortable with R, then use R. If you need to build something performant with many domains, then Julia is a great language for that sort of thing.

评论 #34569619 未加载

fithisuxover 2 years ago

I'm afraid R is dragged by its S legacy. I think its time for these to evolve separately. I see Julia can do what R already does by following software engineering practices, cleaner code and typing.Julia is the new R for me. Unless R re-invents itself.

aljabadiover 2 years ago

It’s important to note that R’s S4 Object System now supports multiple dispatch & I have enjoyed using it. I would agree that it’s not quite as elegant as Julia’s. See <a href="https://www.mpjon.es/2021/05/31/r-julia-multiple-dispatch/" rel="nofollow">https://www.mpjon.es/2021/05/31/r-julia-multiple-dispatch/</a>

评论 #34567800 未加载

评论 #34567080 未加载

gozzooover 2 years ago

The article is supposed to tell us why Jilia is better than R, but it mainly focuses on one feature - multiple dispatch. Can someone please explain - does multiple dispatch provide any advantage over other function call strategies, and even if it does how much effort would it save, how much shorter or less ambiguous our code would become.

评论 #34570636 未加载

评论 #34573475 未加载

maxbooneover 2 years ago

I'd love to see more Julia (or Python) adoption in non-cs/math/phys academic research.It's a breeze doing such analyses with Stata, and with a bunch of weird syntax, some libraries and more lines you can get it done in R as well.But I tried assisting my SO with setting up their statistical methods in Python and it was so much more work than Stata (or R).

dan-robertsonover 2 years ago

Interestingly, I found myself going the other way. Let me first say that R is a hilariously weird-feeling and janky language. The Julia features mentioned (structure are good for organising; compilation and better data structures mean you need to worry less about accidentally writing code that is 10x or 100x slower than it ought to be, which tends to matter a lot for interactive use) are definitely useful, and magically getting e.g. arbitrary precision arithmetic is pretty cool.I think the example in the post shows an annoying way for Julia’s generic functions to be difficult because the function seems to take a matrix but secretly it only wants a 2x2 matrix. If such a function gets called with the wrong value deep in some other computation, and especially if it silently doesn’t complain, you may end up with some pretty annoying bugs. This kind of bug can happen in R too (functions may dispatch on the type of their first arg and many are written to be somewhat generic by inspecting types at runtime). I think it’s a little less likely only because data structures are more limited. A related example that trips me up in R is min vs pmin.The biggest issue I had in practice is that for either language, I wanted to input some data, fiddle with it, draw some graphs, maybe fit some models, and suchlike. R seems to have better libraries for doing the latter but maybe I just didn’t find the right Julia libraries.- I feel like I had more difficulties reading csvs with Julia. But then when I was using Julia, I wanted to read a bunch of ns-precision time stamps which the language didn’t really like, and with R I didn’t happen to need this. I found neither language had amazing datetime type support (partly this is things like precision. Partly this is things like wanting to group by week/day/whatever. Partly this is things like wanting sensible graphs to appear for a time axis)- R has a bigger standard library of functions that are useful to me, e.g. approx or nlm or cut. I think it’s a reasonable philosophy for Julia to want a small stdlib but it is less fun trying to find the right libraries all the time. Presumably if I knew the canonical libraries I would have been happier.- R seems to have better libraries for stats.- I found manipulating dataframes in Julia to be less ergonomic than dplyr, but maybe I just wasn’t using the Julia equivalent. In particular, instead of e.g. mutate(x=cumsum(yfilter)), I would have to write something like mutate(do, [:y, :filter]=>((y,f)-> cumsum(yfilter))=>:x). I didn’t like it, even though it’s clearly more explicit about scoping which I find desirable in a less interactive language.- I much preferred ggplot2 to the options in Julia. It seems the standard thing is plots.jl but I never had a great time with that. Gadfly seemed to have a better interface but had similar issues to manipulating data frames and I found myself hitting many annoying bugs with it. Ggplot is fast slow, however.- Pluto crashed a lot on me, which wasn’t super fun. In general, I felt like Julia was more buggy in general. Though I also get an annoying bug with R where it starts printing new prompts every second or so, and sometimes just crashes after that. Pluto also doesn’t work with Julia’s parallelism features (but maybe it does now?)- The thing that most frustrated me with Pluto/Gadfly was that I would want to take a bunch of data, draw it nice and big, and have a good look at it. Ggplot (probably because of bad hidpi support) does this well by throwing up the plot with a tiny font size on a nice 4k window and, with appropriate options, not doing a ton of X draw calls for partial results (downside: it is still quite slow with a lot of points). Gadfly in Pluto wants to generate an SVG with massive font size and thick borders on chonky scatter plot shapes, and crams it into a tiny rectangle in Pluto. Maybe this is more aesthetic or something but generally I plot things because I want to look at the data and this is not an easy way to look at it. The option to hide the thick borders in gadfly is hilariously obscure. I never bothered learning how to not generate the svg in the notebook. I would just suffer terrible performance while I zoomed in to get a higher resolution screenshot (before deleting the avg in the dev console) or generate a png file.That said, there are still things I don’t know how to do with either plotting system, like reversing a datetime scale, or having a scale where the output coordinate goes as -pseudolog(1-y) to see the tail of an ecdf, or having a scale where the labels come from one source but positions come from some weight, e.g. time on the x axis weighted by cpu-hours so that an equal x distance between points corresponds to equal cpu-hours rather than equal wall-time. Maybe I will learn how to do it someday with ggplot.

评论 #34573647 未加载

bluenose69over 2 years ago

The key comment is that it's hard to know more than 1.5 languages. I think everyone has their own number for that. My number is higher than the author's. I use R for most work, but a lot of my computations involved large binary datasets that are best read with C/C++, so I use C/C++ and R in tandem for my data-analysis work.Separate from that, I use python when I'm writing (undemanding) system-level work. I see it as a great replacement for the shell. (Python took over from perl, and once I got to 20% proficiency with python I had a sigh of relief, knowing that I would never really need to write in perl again.)And, yes, I also use Julia. This is mainly for writing small numerical models. It is a lovely language. I would never start to write a small model in fortran anymore. But that doesn't mean I can leave fortran behind because it is still the language used for large numerical models. (These models involve many tens of person-years of effort by world experts. This is not just a coding thing.)I suspect that quite a lot of people have language limits more like mine than the 1.5 stated by the author. For such people, Julia is definitely an arrow that ought to be in the quiver. It is elegant. It is fast. It is modern. Parts of it are simply delightful. But there are downsides.1. The startup is slow enough to be annoying, for folks (like me) who like to use makefiles to coordinate a lot of steps in analysis, as opposed to staying in a language environment all day long. (Note, though, that julia is getting faster. In particular, the time-to-first-plot has been decreasing from an annoying minute or so, down to perhaps half a minute.) 2. The error messages are often emanated from a low level, making it hard to understand what is wrong. In this, R and python and even C/C++ are much superior. 3. The language is still in rapid development, so quite often the advice you find on the web will not be the best advice. 4. There are several graphics systems, and they work differently. This wild-west approach is confusing to users. Which one to choose? If I run into problems with one and see advice to switch to another, what new roadblocks will I run into? 5. The graphical output is fairly crude, compared with R. 6. It has some great libraries, but in shear number and depth and published documentation, it cannot really hold a candle to R. Nearly every statistical PhD involves R code, and I think quit a lot of packages come from that crucible. This environment ought not to be underestimated.The bottom line? It only takes an hour or so to see that Julia is a wonderful open-source replacement for matlab, and for small tasks that might otherwise be done in Fortran. Anyone with a language capacity of 2 or 3 or more (and I suspect this is many folks on HN) will find Julia to be a great tool to learn, for certain tasks.

评论 #34567410 未加载

评论 #34567387 未加载

usgroupover 2 years ago

TLDR: Author switched to Julia because he “fell in love” with it, with no further qualification.He then speaks a bit about multiple dispatch and how it’s useful when it’s suitable.Personally I saw nothing here that might actually convince someone to switch. R + Tidyverse + Rcpp + CRAN is formidable.

评论 #34568168 未加载

hnarayananover 2 years ago

I get confused by this every time this comes up. Is multiple dispatch the same as function-overloading (e.g. in C++)?

评论 #34567208 未加载

评论 #34569967 未加载

adenozineover 2 years ago

I’ve made most of my career turning scientific and mathematical code into maintainable and aesthetic code, and the red flag for me in this article is that he evidently couldn’t keep up with the Python learning curve and chose instead a language with no traits, no interfaces, and no classes. So, the amount of organization in his code is effectively zero.I understand that Julia 2.0 is slated to have some sort of concrete interface mechanism, so that’s good. Thus far, I’ve seen some pretty low quality results. There’s just no way to have intuition about what method is going to be called in Julia. In python, I know it’s either going to be somewhere in dir(some-obj) or it’s gonna be some funky meta class stuff. Either way, pycharm can literally just hyperlink me.Until Julia has the same capability, it just won’t be suitable for general purpose code. I know there will be some Julia fan in the replies about how I can approximate the behavior, and how Julia is the future and blah blah blah.Just fix interfaces. It’s not that hard. They’ve got MIT grads for crying out loud!I’m a little appalled there’s PhDs doing computer science work with public money that can’t wrap their head around python. That’s a failed curriculum imo.

评论 #34567223 未加载

评论 #34567382 未加载

评论 #34567183 未加载

评论 #34569388 未加载

评论 #34568177 未加载