The Hardest Program I've Ever Written (2015)

105 点作者 tanu057将近 8 年前

13 条评论

eesmith将近 8 年前

"If it took that much thrashing to get it right, you’d expect it to do something pretty deep right? Maybe a low-level hardware interface or .. I’m talking, of course, about an automated code formatter."This reminds me of the Tao of Programming 3.3, at <a href="http://www.mit.edu/~xela/tao.html" rel="nofollow">http://www.mit.edu/~xela/tao.html</a> , the relevant part of which I will copy here:There was once a programmer who was attached to the court of the warlord of Wu. The warlord asked the programmer: "Which is easier to design: an accounting package or an operating system?""An operating system," replied the programmer.The warlord uttered an exclamation of disbelief. "Surely an accounting package is trivial next to the complexity of an operating system," he said."Not so," said the programmer, "When designing an accounting package, the programmer operates as a mediator between people having different ideas: how it must operate, how its reports must appear, and how it must conform to the tax laws. By contrast, an operating system is not limited by outside appearances. When designing an operating system, the programmer seeks the simplest harmony between machine and ideas. This is why an operating system is easier to design."

评论 #15067780 未加载

评论 #15070277 未加载

评论 #15066928 未加载

评论 #15066814 未加载

harpocrates将近 8 年前

Pretty-printing is tough. That said, please don't reinvent the wheel. There is research that has gone into this that should make most of this stuff pretty straightforward. I personally recommend Wadler's "A Prettier Printer" [0] (although credit goes to Hughes for laying a lot of the groundwork [1]). It too uses an IR and has several possible heuristics for rendering.I've been using an implementation of it [2] with a lot of success for pretty printing Rust code [3].<pre><code> [0]: https://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf [1]: http://belle.sourceforge.net/doc/hughes95design.pdf [2]: https://hackage.haskell.org/package/prettyprinter [3]: https://github.com/harpocrates/language-rust</code></pre>

jemfinch将近 8 年前

It sounds like perhaps a case of being so focused on whether the program could be built that they didn't stop to ask whether it should be built.Automatic code formatters don't need to be perfect or complete; they simply must format good code well. If bad code (as many of the difficult examples were) formats poorly, that's just another reason for people to write better code.

评论 #15068017 未加载

评论 #15067616 未加载

peterburkimsher将近 8 年前

I wrote a pretty-printer for bash scripts using PHP. Colouring the keywords was fun, but I moved on to other personal projects instead of tackling line breaks.More recently I've been trying to learn Chinese, and one of the features of Pingtype is to put spaces between words.<a href="http://pingtype.github.io" rel="nofollow">http://pingtype.github.io</a>To my surprise, this article linked to a Wikipedia page about line wrapping, which says that line wrapping in CJK is unsolved.<a href="https://en.wikipedia.org/wiki/Line_wrap_and_word_wrap#Word_wrapping_in_text_containing_Chinese.2C_Japanese.2C_and_Korean" rel="nofollow">https://en.wikipedia.org/wiki/Line_wrap_and_word_wrap#Word_w...</a>"Most existing word processors and typesetting software cannot handle either [personal names or compound words]."My method works, but I don't know who to give it to. This is Hacker News and there's people from all different backgrounds here, so I'll just throw it out there - if anyone is interested, please contact me.

评论 #15070963 未加载

wheresvic1将近 8 年前

Why do you necessarily need a line limit?I would simply indent all chained function calls and be done with it. Eg.<pre><code> return foo(param) .then(bar) .catch(err => { logger.error(err); return -1; });</code></pre>

评论 #15067581 未加载

评论 #15066735 未加载

dracodoc将近 8 年前

One of my side project was an auto formatter for R. There are some limits in existing formatters:- I think most of them doesn't recognize multi line string literals, which is difficult if you consider the case that you can have "" in comments, comment symbol in "" string literals and line breaks. The only way to deal with it is to scan linearly with context.- It's tricky to wrap a long line: + some points in a long line are more suitable for breaking points in logic level + but sometimes you want less lines and not to break too often, even it's more clear in logic. The lines could be just some parameter list that will be both well represented in one column or multi columns. + with nested code the natural indent position could be at the far right, which make each line very short if you stick to 80 columns rule.After quite some efforts my code can deal with all the comments, multi line strings, all the operators I known (I need to separate unary and binary operators to determine whether to insert space), but the script take several seconds to run, and I haven't start to deal with indent. I probably can save some time if I do more optimization, but I don't have time to finish it now.This python formatter talked about its algorithm, worth a read.<a href="https://github.com/google/yapf" rel="nofollow">https://github.com/google/yapf</a>

StefanKarpinski将近 8 年前

Cool post. Some insight into why this problem is so hard: what this post is describing seems to be an integer linear programming problem [1]. I.e. optimizing a linear cost function constrained by (convex) linear bounds with integer-valued variables. The reason it's so difficult is that ILP is an NP-hard problem. Finding the right way to represent program source is also tricky, but, as the post says, doing so in a way that caters to the extremely performance-sensitive solver code is the really difficult bit. A better approach might be to produce an explicit ILP program and use an ILP solver to decide where the line breaks should go. As with many NP-hard problems (e.g. SAT, TSP), there are very good solvers these days that are much faster than anything you could ever hope to write yourself – and they produce fully optimal solutions.[1] <a href="https://en.wikipedia.org/wiki/Integer_programming" rel="nofollow">https://en.wikipedia.org/wiki/Integer_programming</a>

评论 #15068609 未加载

matt_wulfeck将近 8 年前

> There are thirteen places where a line break is possible here according to our style rules. That’s 8,192 different combinations if we brute force them allThis is why the language should be designed with a formatter in mind from the beginning, as Go was designed. Just enough mustaches to make formatters accurate and fast. How many possibilities should there be? Exactly one.

pmoriarty将近 8 年前

"There are thirteen places where a line break is possible here according to our style rules. That’s 8,192 different combinations if we brute force them all. The search space we have to cover is exponentially large..."Sounds like this might be a good candidate for some AI methods which are not intimidated by such large search spaces.

评论 #15066674 未加载

评论 #15066462 未加载

评论 #15069192 未加载

igravious将近 8 年前

I remember this coming up before :) <a href="https://news.ycombinator.com/item?id=10195091" rel="nofollow">https://news.ycombinator.com/item?id=10195091</a>ninja edit: I mostly jump remembered the picture of Robert with his/a dog and the text, “Hi! I'm Bob Nystrom, the one on the left.”

Bromskloss将近 8 年前

> That means adding line breaks (or “splits” as the formatter calls them), and determining the best place to add those is famously hard.Naive question here: What is so hard? It can be solved with dynamic programming, right? Doesn't he even link to solutions of the problem?

评论 #15070295 未加载

ycmbntrthrwaway将近 8 年前

<a href="http://suckless.org/philosophy" rel="nofollow">http://suckless.org/philosophy</a>

评论 #15068774 未加载

gnode将近 8 年前

Please mark with (2015).

评论 #15065138 未加载