As I thought I understood it, Lisp union implements set union. So, I thought that<p><pre><code> (union '(a a) '(a b))
</code></pre>
would return<p><pre><code> (a b)
</code></pre>
And, in fact, it does. This led me to think that<p><pre><code> (union '(a a) '(b))
</code></pre>
would also return<p><pre><code> (A B)
</code></pre>
However, it returns:<p><pre><code> (A A B)
</code></pre>
Why does the fact that there is one less 'a in the second argument in the second call mean that there is one more 'a in the result? This is definitely not set-theoretic union. Why is the behavior of Lisp union defined this way?<p>Here is the sequence of calls again:<p><pre><code> [1]> (union '(a a) '(a b))
(A B)
[2]> (union '(a a) '(b))
(A A B)</code></pre>
Union seems to be implemented in clisp by copying the second set, and adding to the copy any elements from the first set that aren't in the second. If there's a duplicate in the first set, they both get added because it's the original of the second set that gets tested against rather than the copy.<p>So (union '(a b) '(a a)) gives (b a a).
Actually, I take back my statement that this is not set-theoretic union.<p>If I'm willing to accept '(a a) as representing the set '(a) when I use it as an input to union, I should be just as willing to accept '(a a b) as representing the set '(a b) when I receive it back from union.
From <a href="http://www.cs.queensu.ca/software_docs/gnudev/gcl-ansi/gcl_932.html" rel="nofollow">http://www.cs.queensu.ca/software_docs/gnudev/gcl-ansi/gcl_9...</a>:<p>... If either list-1 or list-2 has duplicate entries within it, the redundant entries might or might not appear in the result ...
My set theory knowledge just says that the union of two sets gives you all the elements in them, with no repeats, so I would also follow your thinking. Even when I work with mathematica, I often union two lists to remove duplicate entries, because mathematica makes sure to remove duplicate entries.<p>The no-duplicates would seem to be a very important thing about union, and when they are removed or not. Union is just the complement of intersect, which would be the empty set for (a a) int. (b). So the union would be all the elements: (a a b) and after removing all duplicates you'd get (a b), which is exactly the behavior what you and I expect. Must be something with the duplicates rule.
makes sense to me, and it seems right by set-theory too. Take this all with a grain of salt, though, because it's been a while since I was taught set-theory, but here's my stab at it:<p>there is an intersection between '(a a) and '(a b); the element 'a. Thus, the union would be the set of things inside the intersection --- 'a --- and the parts outside it --- 'b.<p>With the second union, there is no intersection, so the union is merely the set of things not in the intersection, which is the original sets, resulting in '(a a b). There is no relation defined in the union operation between those things not in the intersection, so they remain in the union set.