This regular expression is due to Abigail of Perl fame. Who is still very active, see <a href="http://search.cpan.org/~abigail/" rel="nofollow">http://search.cpan.org/~abigail/</a> for more.<p>However while cute, it is very slow. It tries every possible factorization as a pattern match. When it succeeds, on a string of length n that means that n times it tries to match a string of length n against a specific pattern. This is O(n^2). Try it on primes like 35509, 195341, 526049 and 1030793 and you can observe the slowdown. (Some regular expression engines may employ tricks that speed this up over the naive analysis, but none will be super speedy about it.)<p>Usual arithmetic is much more efficient. :-)
For some this will be obvious, but for others it may still be interesting:<p>If this was a "true" regular expression, in the sense that it describes a regular language (<a href="http://en.wikipedia.org/wiki/Regular_language" rel="nofollow">http://en.wikipedia.org/wiki/Regular_language</a>), then checking whether the number is prime or not, by checking if the expression matches, would run in linear time. This would be worth a few nobel prizes at least.<p>Unfortunately, it is an "extended" regular expression, i.e. not regular at all in the strictest sense, and (as the article says) requires backtracking.
It has been posted before [2], so I'll repost my comment from back then [3]:<p>It's cool, but it's regexp, not regular expression -- regular expressions cannot test for primeness. The proof is a simple application of pumping lemma[1]: we want to show that language { a^p: p prime } is not regular. Suppose it is. By pumping lemma we have numbers N, n >= 1, such that for every k >= 0, we have a^(N+nk) in our language. But that means that for every k, N+nk is prime, and it fails for k = N, since N+nN = N(1+n) is not a prime, since both N and 1+n are greater than 1.<p>[1] - <a href="http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages" rel="nofollow">http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_langu...</a>
[2] - <a href="https://news.ycombinator.com/item?id=3391547" rel="nofollow">https://news.ycombinator.com/item?id=3391547</a>
[3] - <a href="https://news.ycombinator.com/item?id=3391572" rel="nofollow">https://news.ycombinator.com/item?id=3391572</a>
This is indeed beautiful. But the author limits itself to a low-level unpacking of the regex and running a few tests cases, all the while remaining mystified as to why this works.<p>A high-level description helps makes obvious what's happening:<p><pre><code> To check the primality of a number N with string matching,
repeat a string (say, "1") N times
and declare N prime:
UNLESS the repetitions match the string 0 or 1 times
OR the repetitions can be EVENLY matched into groups of MORE than one string.</code></pre>
Uses back-references, so it doesn't describe a regular language in the strict sense. Still, it's pretty cool to use them to trick a string-processing engine to do integer factoring.
I'm currently working on a ruby gem to generate <i>examples</i> of given regular expression. For example:<p><pre><code> /this|is|awesome/.examples # => ["this", "is", "awesome"]
</code></pre>
So, I tried it out on this "prime number validator" regex:<p><pre><code> /1?|(11+?)\1+/.examples
#=> ["", "1", "1111", "111111", "11111111"]
/1?|(11+?)\1+/.examples(max_repeater_variance: 50, max_group_results: 100).uniq.sort.map(&:length).first(10)
#=> [0, 1, 4, 6, 8, 9, 10, 12, 14, 15]
require 'prime'
Array(1..100) - /1?|(11+?)\1+/.examples(max_repeater_variance: 50, max_group_results: 3000).uniq.sort.map(&:length) == Prime.first(25)
#=> true
</code></pre>
Horribly inefficient, but pretty cool! Here's my gem if case anyone's interested: <a href="https://github.com/tom-lord/regexp-examples" rel="nofollow">https://github.com/tom-lord/regexp-examples</a>
Nice (abuse of regexp) !<p>AFAICT the way the string is going to be parsed is equivalent to this:<p><pre><code> def isprime(n):
if n == 0 or n == 1:
return False
i = 2
while i <= n:
if (n - i) % i == 0:
return False
i += 1
return True
</code></pre>
A parser implementation may optimize and do "while (n - i) >= 1" instead, which cuts in half the number of iterations.
I explored this technique applied to a few other problems just for fun, integer square root, coprimes, continued fractions, modulo, etc.<p><a href="https://github.com/fxn/math-with-regexps/blob/master/one-liners.sh" rel="nofollow">https://github.com/fxn/math-with-regexps/blob/master/one-lin...</a>
less sexy when you write<p><pre><code> /^(11+?)\1+$/ is a prime test on whether a unary string > 2 is composite
</code></pre>
then people have a real chance of figuring it out. (go ahead and try, if you haven't read the article.)<p>As it stands that 1 x shift is super confusing. who would have thought 1 would become a literal! we wouldn't even write it that way in english.
The thing that made this click for me was realizing the "1" in the regex was not special. The regex is just looping through integers (starting with 2) and checking if the target number is divisible by any of them. It can use any digit if the target string uses the same digit. Ruby example using 5 instead of 1:<p>def is_prime(n); str = "5"*n; str !~ /^5?$|^(55+?)\1+$/; end<p>The regex essentially says, for integer n, "does 2 ones fit evenly into n ones", then "does 3 ones fit into n ones", etc., which is the same as "does 2 fit into n", "does 3 fit into n", etc.
Thanks for sharing. This is a post I wrote way back in 2007 and, even though, as many have pointed out, the algorithm is very inefficient, it is still a good illustration of how powerful regular expressions are.<p>When I am explaining the basics of regular expressions to people, I always refer them to this post to make them realize that regular expressions are not only glorified search terms. In general, they are amazed :-)
An aspect of the syntax is confusing me here. I would have assumed that the expression 'x+?' would be parsed as '((x)+)?', ie. 'x*'. However, it seems the parser special-cases this as (x)(+?) instead, and interprets that as a non-greedy operator. Isn't this horribly inconsistent?
nice trick!<p>here a JS implementation<p><pre><code> function isPrime(n) {
var re = /^1?$|^(11+?)\1+$/;
var s = [];
while (n -- > 0)
s.push(1);
return !re.test(s.join(''));
}</code></pre>