<p><pre><code> ^(?!(..+)(\1)+$)
</code></pre>
Why does that work on primes? I got it by mistake when fiddling with the parenthesis locations but I was expecting to have to deal with xx separately.
Many times I would match the exact opposite opposite of what I wanted. Is there a general rule for inverting regexps? ^ and ?! don't seem general purpose.
I figured out Ranges!<p><pre><code> abac|accede|adead|babe|bead|bebed|bedad|bedded|bedead|bedeaf|caba|caffa|dace|dade|daff|dead|deed|deface|faded|faff|feed
</code></pre>
Edit: /s
Glob (333) without cheating (replace ⁕s with asterisks, they get turned into italics):<p>^(\⁕?)(\w⁕)(\⁕?)(\w⁕)(\⁕?)(\w⁕) .⁕ ((.(?!\1))+|\1)\2((.(?!\3))+|\3)\4((.(?!\5))+|\5)\6$<p>Edit:
((.(?!\1))+|\1) is used to conditionally match .+ iff a * has been found.
.(?!\1) Matches any character if it is followed by \1. When * has been found then it matches no character, when * is not found it matches every character.<p>Edit 2:
Formatting to avoid the *s becoming italics :/
An interesting bit on the computational complexity of solving this problem (with a slightly different scoring function):<p><a href="http://cstheory.stackexchange.com/questions/1854/is-finding-the-minimum-regular-expression-an-np-complete-problem" rel="nofollow">http://cstheory.stackexchange.com/questions/1854/is-finding-...</a>
Hrm, on number 8 "Four" using:<p><pre><code> (.)(.*\1){3,}
</code></pre>
I got all but the "do not match" for "Ternstroemiaceae"<p>The challenge appeared to be to match words with four instances of the same letter. "Ternstroemiaceae" contains four 'e's, and thus should be in the "match" column, instead of the "don't match" column, no? Did I miss something?
Gist with my answers: <a href="https://gist.github.com/jpsim/8057500" rel="nofollow">https://gist.github.com/jpsim/8057500</a><p>If you look at the revisions, you'll see my 1st iteration was mostly identifying patterns, then with more and more cheating (and looking at this thread) to squeeze every point possible.
What are the rules? That is, are these Perl regexes, POSIX regexes, …? (Come to that, what <i>is</i> this site? Going up one level to alf.nu gives me a lot of suggestions for what I can do by modifying the address, but no clue of who's doing it on my behalf.)
challenge: use machine learning to find the best solutions.<p>They might improve on those intended by exploiting accidental regularity in the corpus - though charmingly, the golf-cost of regex length helps combat this overfitting. They might also find genuinely cleverer solutions.
Ternstroemiaceae contains four es; anyone else hit that issue / know what that's not in the valid results for Four? I'm guessing there's a pun in there that I didn't get :/
I like the concept but the word choice doesn't seem too "regular", it is more about catching all the particulars rather than finding a pattern as far as I can tell.
I got Four for 196 with<p><pre><code> (.).*\1.\1.*\1
</code></pre>
and Order for 156 with<p><pre><code> ^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$</code></pre>
For Abba...<p>Why doesn't (.)(.)\2[^\1] work?<p>I thought backreferences matched the captured literal, so negating it would match? But this looks the same as (.)(.)\2\1...
what's the pattern on "Abba"? I thought it was just to exclude doubled letters but I have doubles on two words on the left hand side as well (noisefully and effusive, in case the word lists are the same)