科技回声

7 条评论

shalmanese大约 13 年前

I created haiku_robot (<a href="http://www.reddit.com/user/haiku_robot" rel="nofollow">http://www.reddit.com/user/haiku_robot</a>) on reddit and, from experience, found that it wasn't too worthwhile optimizing for accuracy. The cases where I got the syllable count wrong seemed to have an equal distribution of upvotes compared to the ones where I got it right and regional variations in pronunciation meant that I was accused of being wrong more often when I was right than when I was wrong.

评论 #3796117 未加载

mjn大约 13 年前

This is pretty neat. I've been puttering on one on and off, but it's horribly broken so I haven't released it, so this one gets extra points for actually existing. :)In case my half-done thoughts are useful to anyone looking to build something in this space:My aim is/was to allow configurable matching, so you can match, e.g. "XxxXxx / XxxXxx1 / XxxXxx / XxxXxx1", meaning four consecutive lines of six syllables, where X is a stressed, and x an unstressed syllable, and where the last syllable of the 2nd and 4th lines have the same phoneme, denoted "1", whereas there are no phonemic constraints on any other syllables (this allows a crude approach to rhyme).I'm not entirely happy with cmudict because, since it works one syllable at a time, it can't really do much about stress, which can vary depending on the surrounding words. I've been using the output of espeak -x instead, which gives a phonetic rendering of an entire sentence, including assigning both phonemes and stress. I'm not sure if it's genuinely an improvement though. Its poorly documented output surely isn't an improvement! And in particular it gives a normal prosaic reading of a sentence, which might be too constraining for poetry-finding, since poems often allow a bit of freedom on moving around the stresses.The idea to scan large amounts of text is to compile the configurable pattern into a regex that matches espeak -x output, so for example X gets mapped to a "match any stressed syllable" regex snippet. Alas, that's error-prone, especially since the espeak -x phoneme format is a bit quirky (e.g. no fixed length per syllable or syllable markers, so you need to have some per-language rules to figure out what sequences of ASCII constitute what, which I haven't debugged).

评论 #3796210 未加载

Jun8大约 13 年前

Fantastic! This shows the possibilities of what can be created given the text on Gutenberg archives. Assuming all the fiction ever created is available on your laptop (quite feasible now, except of course, for the small matter copyright) what new expressions can be derived?On a different note, I read the about section of the blog and saw that the OP, in addition to this great stuff, is a beekeeping, hacking attorney who also spins fire. Amazing!

talos大约 13 年前

for placing every moment ofthe labourer's time and that ofhis family at thedisposal of thecapitalist for the purpose ofgreater quantity of labourIn addition to a measureof its extensionie durationlabour now acquires a measure-Karl Marx

chronomex大约 13 年前

It may be interesting to adapt the TeX hyphenation methods to this problem.

mfringel大约 13 年前

Great stuff! Seeing the thought processes intertwined with the implementation is fascinating.

msutherl大约 13 年前

I am (the man) from Nantucket. Any other Nantucketers on HN?

7 条评论

shalmanese大约 13 年前

评论 #3796117 未加载

mjn大约 13 年前

评论 #3796210 未加载

Jun8大约 13 年前

talos大约 13 年前

chronomex大约 13 年前

It may be interesting to adapt the TeX hyphenation methods to this problem.

mfringel大约 13 年前

Great stuff! Seeing the thought processes intertwined with the implementation is fascinating.

msutherl大约 13 年前

I am (the man) from Nantucket. Any other Nantucketers on HN?

Nantucket: an accidental limerick detector

7 条评论

Nantucket: an accidental limerick detector

7 条评论