According to Donald Knuth, "This algorithm goes back to Nārāyaṇa Paṇḍita in 14th-century India; it also appeared in C. F. Hindenburg's preface to _Specimen Analyticum de Lineis Curvis Secundi Ordinis_ by C. F. Rudiger (Leipzig: 1784), xlvi--xlvii, and it has been frequently rediscovered ever since". (Volume 4A, Section 7.2.1.2, Algorithm L)
This sounds like what next_permutation does in the C++ STL. <a href="https://en.cppreference.com/w/cpp/algorithm/next_permutation" rel="nofollow">https://en.cppreference.com/w/cpp/algorithm/next_permutation</a>
Btw, XOR swap...
probably slower than register-aliased swap, but it swaps two registers without using a third:<p><pre><code> void swap(char *s, int a, int b)
{
s[b] ^= s[a];
s[a] ^= s[b];
s[b] ^= s[a];
}</code></pre>