This is pretty neat. I've been puttering on one on and off, but it's horribly broken so I haven't released it, so this one gets extra points for actually existing. :)<p>In case my half-done thoughts are useful to anyone looking to build something in this space:<p>My aim is/was to allow configurable matching, so you can match, e.g. "XxxXxx / XxxXxx1 / XxxXxx / XxxXxx1", meaning four consecutive lines of six syllables, where X is a stressed, and x an unstressed syllable, and where the last syllable of the 2nd and 4th lines have the same phoneme, denoted "1", whereas there are no phonemic constraints on any other syllables (this allows a crude approach to rhyme).<p>I'm not entirely happy with cmudict because, since it works one syllable at a time, it can't really do much about stress, which can vary depending on the surrounding words. I've been using the output of <i>espeak -x</i> instead, which gives a phonetic rendering of an entire sentence, including assigning both phonemes and stress. I'm not sure if it's genuinely an improvement though. Its poorly documented output surely isn't an improvement! And in particular it gives a normal prosaic reading of a sentence, which might be too constraining for poetry-finding, since poems often allow a bit of freedom on moving around the stresses.<p>The idea to scan large amounts of text is to compile the configurable pattern into a regex that matches espeak -x output, so for example X gets mapped to a "match any stressed syllable" regex snippet. Alas, that's error-prone, especially since the espeak -x phoneme format is a bit quirky (e.g. no fixed length per syllable or syllable markers, so you need to have some per-language rules to figure out what sequences of ASCII constitute what, which I haven't debugged).