Worth a read for the vector maths interpretation:<p><i>I could calculate the vectors for all of my notes and use something like the k-means algorithm to find semantically-related clusters of notes.</i><p>If you're familiar with information retrieval techniques, there's probably nothing new here, but eye-opening if you're rusty like me.
This is very similar to something I did few years ago with 6-7 years of my blog entries. I wrote a script that generates timeline-based tag-clouds from plain-text: <a href="http://chir.ag/projects/tagline/" rel="nofollow">http://chir.ag/projects/tagline/</a> and here's an example: <a href="http://chir.ag/projects/preztags/" rel="nofollow">http://chir.ag/projects/preztags/</a><p>The basic algorithm is nearly the same and it does use stemming (though not synonyms, just related spelling). It takes an XML input file and spits out an HTML file with the required JS embedded.