Experience report on a large Python-to-Go translation

274 点作者 psxuaw超过 5 年前

18 条评论

mappu超过 5 年前

Here's another experience report: I ported a small 1 KLOC PHP project to Go this week (in some spare time between large C++ compile times). The primary goal was to reduce the number of supported languages we use.The port happened in the mechanical line-by-line way, copying each PHP file to a *.go and fixing all the syntax. The project was small enough that automation wasn't interesting.I agree with the "1/3 time spent debugging the result". Another complicated facet was the lack of insertion-order preserving maps, that PHP applications end up relying on heavily. The error/exception impedance mismatch was not a problem in practice at all.According to cloc, the original PHP project (excluding vendor) is 1.0 KLOC, the resulting Go application is 1.2 KLOC. I imagined Go would have been more verbose than this, but actually most lines remained 1:1 conversions, and the Go standard library happened to cover a lot of small utility functions that had to be separately written in PHP (e.g. for string suffix matching).Another interesting point is the number of comment lines in cloc appeared to drop dramatically, since real type annotations are much less verbose than PHPDoc.

评论 #22305425 未加载

评论 #22308307 未加载

jasonpeacock超过 5 年前

I'd like to compliment the author on the quality of this post. It's very well written, data/example driven, fair, and educational.Overall, a joy to read. Thank you!

评论 #22305471 未加载

评论 #22304961 未加载

melling超过 5 年前

Go is probably more verbose because it’s missing List comprehensions, for example.It needs map, filter, reduce to reduce line count. Swift, while probably not as performant as Go, makes writing in a more Pythonic style.<pre><code> [1,2,3,4,5,6,7,8,9].filter {$0 % 2 == 0}.map {$0 * 2}.reduce(0, +) ["550", "a", "6", "b", "42", "99", "100"].compactMap{Int($0)}.filter {$0 < 100} </code></pre> <a href="https://github.com/melling/SwiftCookBook/blob/master/functional.md" rel="nofollow">https://github.com/melling/SwiftCookBook/blob/master/functio...</a>

评论 #22307461 未加载

评论 #22304949 未加载

评论 #22309355 未加载

hartzell超过 5 年前

[edit: fixed links]This was discussed recently on the go-nuts mailing list: <a href="https://groups.google.com/d/msg/golang-nuts/u-L7PRa2Z-w/kfUSx81PAAAJ" rel="nofollow">https://groups.google.com/d/msg/golang-nuts/u-L7PRa2Z-w/kfUS...</a>There was also discussion around an earlier post he made about the work: <a href="https://groups.google.com/d/msg/golang-nuts/WstriKt2jTA/lsZyX4hYAwAJ" rel="nofollow">https://groups.google.com/d/msg/golang-nuts/WstriKt2jTA/lsZy...</a>

评论 #22309370 未加载

downerending超过 5 年前

Interesting. I'd have expected more than a 50% code expansion going to Go, maybe even 3x or 5x.Similarly, he's using 40x speedup as a rule of thumb. I usually think of Python as 20x slower than C.Personally I'd be loathe to convert a working Python system to Go, but it sounds like he had good reasons. I do wonder a bit whether divide-and-conquer or a C extension might not have worked instead.

评论 #22305214 未加载

评论 #22308464 未加载

评论 #22308431 未加载

outworlder超过 5 年前

> The problem directed the choice of Go, not the other way around. I seriously considered OCaml or a compiled Lisp as alternatives. I concluded that in either case the semantic gap between Python and the target language was so large that translation would be impractical. Only Go offered me any practical hope.I wish they expanded more on this point. Do they mean that rewriting in, say, Lisp would be longer because it wouldn't be a 'port' and more like writing a new program from scratch?EDIT: Spoke too soon. Reading more carefully, I answered my own question.> Python reposurgeon was 14 KLOC of dense code. At that scale, any prudent person in a situation like this will perform as linear and literal a translation as possible;

评论 #22336066 未加载

评论 #22305101 未加载

DougBTX超过 5 年前

There would probably be a much longer list of issues if ESR had converted to Rust instead, but the syntax for error returns is quite interesting. Rust and Go both opt not to have exceptions, instead they use error return values.The original Python code using exceptions was:<pre><code> sink = transform3(transform2(transform1(source))) </code></pre> Making that use error return values looks quite verbose in Go, but Rust has syntax specifically for that case, making it quite manageable:<pre><code> sink = transform3(transform2(transform1(source)?)?)?</code></pre>

评论 #22308409 未加载

评论 #22308157 未加载

评论 #22315098 未加载

评论 #22311582 未加载

评论 #22309546 未加载

patrec超过 5 年前

Adding lookbehinds to the regexp library is a terrible idea.> The regexp implementation provided by this package is guaranteed to run in time linear in the size of the input.Python's is exponential, because it inherits all the non-regular "regular" expression mess (such as lookbehind and backrefs) from perl.One would assume esr would have marinated in unix culture for long enough to be aware of this.

评论 #22307316 未加载

评论 #22307009 未加载

cik超过 5 年前

I love reading ESR, and it's such a pleasure to see how great his writing is in 2020. This made me hark back to The Cathedral and the Bazaar - and also echoed my usage.I do however find that when you have a python project that is heavily dependent on third party libraries - these things get significantly larger and more problematic. That's not really a commentary on Go, inasmuch that it's a byproduct of the longevity of Python.

ageofwant超过 5 年前

I do not get why people do these total rewrites, especially for working Python systems. Why throw out the baby with the bathwater ? Python is fundamentally a composing toolkit. Rewrite the slow bits in C++/Rust/Go and wrap it. That's how all major Python components like Numpy, Scipy, Tensorflow, PyTorch etc. does it. And that's a major reason why Python dominates today.Align with the core strengths of Python's philosophy and its toolset and get the benefit. Why fight it ?

评论 #22305946 未加载

评论 #22305470 未加载

评论 #22305951 未加载

评论 #22305250 未加载

评论 #22305901 未加载

3fe9a03ccd14ca5超过 5 年前

> The man barrier to translation was that, while at 14KLOC of Python reposurgeon was not especially large, the code is very dense.Reasoning about somebody else’s dense code is probably the least favorite activities. When I hear about a language being “expressive” or having “flexible syntax” I shudder.

评论 #22306544 未加载

mistrial9超过 5 年前

significant mastery ahead!This is a success story and a teaching document... have to point to this : "Now that I’ve seen Go strings… holy hell, Python 3 unicode strings sure look like a nasty botch in retrospect. " (!)

评论 #22305875 未加载

xiaodai超过 5 年前

If it was too slow in Python and now moving to Go. Could there a time when there is a need to move to Rust/C/C++ for even faster performance? Go seems an odd choice based on performance consideration alone.

评论 #22307314 未加载

评论 #22310069 未加载

评论 #22307173 未加载

jrockway超过 5 年前

Pretty interesting. It is scary to make your "learn a new language" task to port 14,000 lines of code, but with that in mind, this all seems to have gone well. Some random thoughts:> I had to write my own set-of-int and set-of-string classesmap[int]struct{}, map[string]struct{}<pre><code> ints[42] = struct{}{} // insert delete(ints, 42) // delete for i := range ints { ... } // iterate if _, ok := ints[42]; ok { ... } // exists? </code></pre> > Catchable exceptions require silly contortionsI am not sure why go has panic/recover, but it's not something to use. panic means "programming error", recover means "well, the code is broken but I'm just a humble generic web framework and I guess maybe the next request won't be broken, so let's keep running". It is absolutely not for things like "timeout waiting for webserver" or "no rows in the database" as other languages use exceptions for. For those, you return an error and either wrap it with fmt.Errorf("waiting for webserver: %w", err) or check it with errors.Is and move on. Yup, you have to remember to do that or your program will run weirdly. It's just how it is. There is not something better that maybe with some experimentation you will figure out. You have to just do the tedious, boring, and simple thing.I have used recover exactly once in my career. I wrote a program that accepted mini-programs (more like rules) via a config file that could be reloaded at runtime. We tried to prove them safe, but recover was there so that we could disable the faulty rule and keep running in the event some sort of null pointer snuck in. (I don't think one ever did!)> Pass-by-reference vs. pass-by-valueI feel like the author wants []*Type instead of []Type here.> Absence of sum/discriminated-union typesTrue. Depending on what your goals are, there are many possibilities:<pre><code> type IntOrString struct { i int; s string; iok, sok bool } func (i IntOrString) String() (string, error) { if i.sok { return i.s, nil } else { return "", errors.New("is not a string") }} func NewInt(x int) IntOrString { return IntOrString{i: x, iok: true} } ... </code></pre> This uses more memory than interface{}, but it's also very clear what you intend for this thing to be.I will also point out that switch can bind the value for you:<pre><code> switch x := foo.(type) { case int: return x + 1 case string: i, err := strconv.Atoi(x) return i + 1 } </code></pre> And that you need not name types merely to call methods on them:<pre><code> if x, ok := foo.(interface { IntValue() int }); ok { return x.IntValue() } </code></pre> You can also go crazy like the proto compiler does for implementations of "oneof" and have infinite flexibility. It is not very ergonomic, but it is reliable.> Keyword arguments<pre><code> type Point struct { X, Y float64 } func EuclideanDistance(a, b Point) float64 { ... } EuclideanDistance(Point{X: 1, Y: 2}, Point{3, 4}) </code></pre> > No map over slicesThis one is like returning errors. You will press a lot of buttons on your keyboard. It is how it is.I personally hate typing the average "simple" for loop:<pre><code> func fooToInterface(foos []foo) []interface{} { var result []interface{} for _, f := range foos { result = append(result, f) } return result } </code></pre> But it's also not that hard. I used to be a Python readability reviewer at Google. I always had the hardest time reading people's very aggressive list comprehensions. It was like they HAD to get their entire program into one line of code, or people wouldn't think they were smart. The result was that the line became a black box; nobody would read it, it was just assumed to work.I really like seeing the word "for" twice when you're iterating over two things.

评论 #22308827 未加载

评论 #22306256 未加载

评论 #22305924 未加载

评论 #22309467 未加载

luord超过 5 年前

I liked this summation because the migration happened for a truly valid reason: Python really was a bottleneck. Not that I expected ESR to succumb to hype driven development, but it's nice to see for sure.On the article itself: I just knew that error handling would have the biggest write-up, even when the one writing was someone like ESR. Gods, the error handling in Go is odious.Now my obligatory opinion: If only [insert language here, Go in my current job]'s promise of producing more maintainable code was true; the reality is that it's just the same nigh unmaintainable hell I've found in nearly every other project I've worked on. At least Python is nice to read, even (mostly) when awfully written. Oh, how I miss it.

Insanity超过 5 年前

The missing 'keyword arguments' could have been replaced with a struct passed to a function, no? Unless I'm missing something from Python, in Go you could replace this type of function:<pre><code> func f(x int, y int, c string) </code></pre> with something like this:<pre><code> type funcOptions struct { x, y int c string } func f(o funcOptions) {} f(funcOptions{x:3, y:-1, c: "hello"}) </code></pre> So the readability hit would have been more 'minimal.

评论 #22308532 未加载

评论 #22309012 未加载

评论 #22309413 未加载

评论 #22308597 未加载

tanilama超过 5 年前

One of the better read for a long time.The translation assistant you write is actually very interesting. Heavily rule based but surprising to see it actually helps at all.But the scale of the project itself seems still pretty limited, reimplementation could still be an option.Overall good read and interesting approach

transfire超过 5 年前

If you want a real surprise, try a rewrite in Elixir.

评论 #22312642 未加载