Thanks for introducing a new (to me) idea. I didn't watch the video but I felt the write-up could have been more cohesive. Perhaps just a conclusion to tie all the ideas together. I'm also left wondering why we would use this WME approach over other document embedding techniques (averaging word vectors, paragraph vectors, smooth inverse frequency weighting, etc). Is it faster, gives better similarity estimates, etc.?
Interesting idea!<p>Perhaps the 'random' docs could instead be generated (or even trained) for even-greater significance of the new embeddings.<p>For example: after doing LDA, generate a 'paragon' doc of each topic. Or coalescing all docs of a known label together, then reducing them to D summary pseudo-words – the D 'words' with minimum total WMD to all docs of the same label. Or adding further R docs into regions of maximum confusion.