> I wrote this code putting brevity over readability, which is something I usually never do<p>Shouldn't the point of such a post be to show interesting code? I'm having trouble reading through the densely packed source.<p>In addition to tromp's minor nitpick, I have several major ones.<p>- the code is full of redundant parentheses. HLint can detect those (and many other style errors) automatically. LPaste has HLint installed so you have a linting pastebin available online. <a href="http://lpaste.net/116871" rel="nofollow">http://lpaste.net/116871</a><p>- A lot of the functions are written in a non-idiomatic way. "m >>= return . f" is "fmap", "(.)" can combine functions much more readable than Lisp stacks of parentheses.<p>- ByteString.Char8 is usually a wrong choice, more on that here: <a href="https://github.com/quchen/articles/blob/master/fbut.md#bytestringchar8-is-bad" rel="nofollow">https://github.com/quchen/articles/blob/master/fbut.md#bytes...</a><p>- If you count to "length x" then often there's a more elegant solution that avoids calculating the length altogether. For example "splits xs = zip (inits xs) (tails xs)".<p>- Brevity is never better than readability.<p>- No top-level definitions should lack a type signature. GHC even has warnings for that (I think they start firing with -W).<p>- A function should do one thing and then be composed with other functions. "lowerWords" converts to words and then maps them all to lower case, for example. These are two completely different operations in one long line.<p>- In order of increasing generality: foldr union empty = unions = mconcat = fold<p>- Use pattern matching, avoid "(!!)". transposes w = [ a ++ [b0,b1] ++ bs | (a, b0:b1:bs) <- splits w] - also see <a href="https://github.com/quchen/articles/blob/master/fbut.md#head-tail-isjust-isnothing-fromjust-" rel="nofollow">https://github.com/quchen/articles/blob/master/fbut.md#head-...</a><p>- For large amounts of words that you split and concatenate again, String is probably not the right type. Text is good for dealing with such things.<p>- replaces w = [as ++ [c] ++ bs | (as, _:bs) <- splits w , c <- alphabet]<p>... and so on.
Minor nitpick: the first real line of code<p><pre><code> alphabet = "abcdefghijklmnopqrstuvwxyz"
</code></pre>
is better written as<p><pre><code> alphabet = ['a'..'z']
</code></pre>
This is really syntactic sugar for<p><pre><code> enumFromTo 'a' 'z'
</code></pre>
using the function<p><pre><code> enumFromTo :: Enum a => a -> a -> [a]
</code></pre>
from the typeclass Enum for enumerable types,
and the fact that a string (type String) is just
a list of characters (type [Char]).
Interesting read for a Haskell newcomer like me!<p>Regarding the original webpage of Norvig's spelling corrector, I think it is not up to date as I remember browsing the web and finding some shorter versions in other languages.<p>I've shortened the Python version to 14/15 lines using some features of Python3.
Cool! Since we're suggesting changes, here's what I'd do. (Not that anything is wrong with the OP's code, just that it's good to point out all the different stylistic techniques you can adopt.)<p><pre><code> 7. alphabet = ['a'..'z']
8. nWords = B.readFile "big.txt" >>= return . train . lowerWords . B.unpack
</code></pre>
or:<p><pre><code> 8. nWords = train . lowerWords . B.unpack <$> B.readFile "big.txt"
</code></pre>
Make `splits`, `deletes`, etc. values (not functions). `splits` has access to `w`, so there's no need to pass it as an argument 4 times (or even to pass `w` as an argument to the other functions).<p><pre><code> 27. sortCandidates = (sortBy (flip (comparing snd))) . M.toList</code></pre>
Not to make any particular point, but mainly just because I fancied a bit of procrastination this afternoon, here's a CoffeeScript version (heavily leaning on Underscore): <a href="https://gist.github.com/benshimmin/2ee78c932797faadfc89" rel="nofollow">https://gist.github.com/benshimmin/2ee78c932797faadfc89</a>