TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Nantucket: an accidental limerick detector

59 pointsby Omni5cienceabout 13 years ago

7 comments

shalmaneseabout 13 years ago
I created haiku_robot (<a href="http://www.reddit.com/user/haiku_robot" rel="nofollow">http://www.reddit.com/user/haiku_robot</a>) on reddit and, from experience, found that it wasn't too worthwhile optimizing for accuracy. The cases where I got the syllable count wrong seemed to have an equal distribution of upvotes compared to the ones where I got it right and regional variations in pronunciation meant that I was accused of being wrong more often when I was right than when I was wrong.
评论 #3796117 未加载
mjnabout 13 years ago
This is pretty neat. I've been puttering on one on and off, but it's horribly broken so I haven't released it, so this one gets extra points for actually existing. :)<p>In case my half-done thoughts are useful to anyone looking to build something in this space:<p>My aim is/was to allow configurable matching, so you can match, e.g. "XxxXxx / XxxXxx1 / XxxXxx / XxxXxx1", meaning four consecutive lines of six syllables, where X is a stressed, and x an unstressed syllable, and where the last syllable of the 2nd and 4th lines have the same phoneme, denoted "1", whereas there are no phonemic constraints on any other syllables (this allows a crude approach to rhyme).<p>I'm not entirely happy with cmudict because, since it works one syllable at a time, it can't really do much about stress, which can vary depending on the surrounding words. I've been using the output of <i>espeak -x</i> instead, which gives a phonetic rendering of an entire sentence, including assigning both phonemes and stress. I'm not sure if it's genuinely an improvement though. Its poorly documented output surely isn't an improvement! And in particular it gives a normal prosaic reading of a sentence, which might be too constraining for poetry-finding, since poems often allow a bit of freedom on moving around the stresses.<p>The idea to scan large amounts of text is to compile the configurable pattern into a regex that matches espeak -x output, so for example X gets mapped to a "match any stressed syllable" regex snippet. Alas, that's error-prone, especially since the espeak -x phoneme format is a bit quirky (e.g. no fixed length per syllable or syllable markers, so you need to have some per-language rules to figure out what sequences of ASCII constitute what, which I haven't debugged).
评论 #3796210 未加载
Jun8about 13 years ago
Fantastic! This shows the possibilities of what can be created given the text on Gutenberg archives. Assuming all the fiction ever created is available on your laptop (quite feasible now, except of course, for the small matter copyright) what new expressions can be derived?<p>On a different note, I read the about section of the blog and saw that the OP, in addition to this great stuff, is a beekeeping, hacking attorney who also spins fire. Amazing!
talosabout 13 years ago
for placing every moment of<p>the labourer's time and that of<p>his family at the<p>disposal of the<p>capitalist for the purpose of<p>greater quantity of labour<p>In addition to a measure<p>of its extension<p>ie duration<p>labour now acquires a measure<p>-Karl Marx
chronomexabout 13 years ago
It may be interesting to adapt the TeX hyphenation methods to this problem.
mfringelabout 13 years ago
Great stuff! Seeing the thought processes intertwined with the implementation is fascinating.
msutherlabout 13 years ago
I am (the man) from Nantucket. Any other Nantucketers on HN?