Show HN: Experiments in AI-generation of crosswords

38 点作者 abstractbill5 个月前

Hi HN, I've been experimenting on-and-off over the years trying to automatically generate crosswords [1]. Recently I've been feeling like my results are good enough that I want to share them and see what other people think. I'm not trying to claim that these could appear in, say, the NYT in their current state, but honestly the velocity of progress makes me feel like I will inevitably be able to automatically generate NYT-quality crosswords within just a year or so.A write-up is here: <a href="https://abstractnonsense.com/crosswords.html" rel="nofollow">https://abstractnonsense.com/crosswords.html</a>And you can play the crosswords here: <a href="https://crosswordracing.com" rel="nofollow">https://crosswordracing.com</a> (They should work well on both desktop and mobile, and there's a leader-board for each crossword if you want to leave your name when you solve one).[1]: Just in case anyone is interested, my very first attempt at this problem was way back in 2006! I used multiple wordlists (e.g. list of British monarchs, with reign dates), and wrote little functions to generate clues from each list (e.g. "British monarch who ruled from {date1} to {date2}"). Even with randomized synonym substitution and similar tricks, this approach was too labor-intensive, and the results too robotic, for it to work well. Can't complain though, that project led to me getting hired as the first engineer at Justin.TV!

7 条评论

vunderba5 个月前

Not bad.As someone who has dabbled in AI generated crosswords I found that providing samples of "good crossword clues" (which I curated from historical NYT monday puzzles) as part of the LLM context helped tremendously in generating better clues.There was also a Show HN for a generative AI crossword puzzle system a few months ago so I'll include what I mentioned there:Part of the deep satisfaction in solving a crossword puzzle is the specificity of the answer. It's far more gratifying to answer a question with something like "Hawking" then to answer with "scientist", or answering with "mandelbrot" versus "shape".So ideally, you want to lean towards "specificity" wherever possible, and use "generics" as filler.Link:<a href="https://news.ycombinator.com/item?id=41879754">https://news.ycombinator.com/item?id=41879754</a>

评论 #42497543 未加载

korymath5 个月前

Great post.Funny, I just posted this to X2025 GenAI challengeCreate a 5x5 crossword puzzle with two distinct solutions. Each clue must work for both solutions. Do not use the same word in both solutions. No black squares.I try with each new model that lands. Still can’t get it.

评论 #42497847 未加载

评论 #42497982 未加载

评论 #42498508 未加载

super7ramp4 个月前

Hi, thank you for the write-up.> Once we have a grid, we try to fill it with words! I use simple backtracking search for that, with a timeout to stop the search on grids that are likely impossible to fill. In practice it's easy to generate a new filled grid from scratch about once every two minutes.Have you explored other search techniques?> After the grid is full of words, we use an LLM to generate some clues. I've iterated over many models and prompts for this.Could you share the prompts and the models you tried?Shameless plug: I've been interested in crossword generation for a while as well and made that toy: <a href="https://github.com/super7ramp/croiseur">https://github.com/super7ramp/croiseur</a>. No grid generation but automatic filling and clue generation. Clues are not really good, currently using gpt-4o-mini.

furyofantares5 个月前

I've tried to get o1 to generate Xordle puzzles.Warning: post contains a spoiler for a recent Xordle.Xordle is Wordle with two target words that share no letters in common. Additionally, there is a "free clue" given at the start, and all three words are thematically linked. It's not always a straightforward link, for example a recent puzzle had the starter word 'grief' and targets 'empty' and 'chair'. All puzzles today are selected from user submissions.o1 is the first model that's been able to solve Xordles reliably, or to generate valid puzzles at all. It's well-known that these things are massively handicapped for this type of task due to tokenization.But since o1 can in fact achieve it, I wanted to see if I could get it to make puzzles that are at all satisfying. Instead it makes very bland puzzles, with straightforward connections and extremely broad themes.Prompting can swing the pendulum too far in the other direction, to puzzles where the connection is contrived and impossible to see even after it's solved. As I've often experienced with LLMs, being able to hit either side of a target with prompting does not necessarily mean you can get it to land in the middle, and in fact I have had no success in doing so with this task.This is one of the most basic examples I know of lack of creativity or "taste" to an LLM. It is a little hard for a human to generate two 5-letter words with no overlap, but it is extremely easy for a human to look for a thematic connection among 2-3 words and say if it's satisfying. But so far I've been totally unable to make the LLM make satisfying puzzles.edit: Nothin' like making a claim about LLMs to get one up off one's ass and try to prove it wrong immediately. I'm getting some much better results with better examples now.

评论 #42497554 未加载

评论 #42497564 未加载

corlinpalmer5 个月前

Awesome! I have also dabbled in AI-generated crosswords, but I was more fascinated with the concept of generating the most efficient layout of an X-by-X grid from a given word set. It's a surprisingly difficult optimization problem because the combinatorics are insane. Here's an example output trying to find the most efficient layout of common Linux terminal commands:<pre><code> W P G H I S T O R Y E O R T Y U M L E S S P I O C A T U S E R A D D L T R D C </code></pre> Of course this is a pretty small grid and it gets more difficult with size. I've thought about making a competition from this sort of challenge. Would anyone be interested?

评论 #42498877 未加载

gowld5 个月前

The "American" grids aren't American. An American grid almost always has 2 answers (both directions) per square.

评论 #42497495 未加载

dgreensp5 个月前

I found this article a bit disappointing.The link at the bottom doesn’t work.The grids shown do not follow the well-known rules of (American) crosswords: every square is part of two words of three or more letters each.Coming up with a pattern of black squares, and writing good clues, are two parts of making a crossword puzzle that are IMO fun and benefit from a human touch, and are not overly difficult. There are also databases of past clues used in crossword puzzles (eg every NY Times clue ever, and various crossword dictionaries) for reference and possible training. If you don’t care about originality (or copyright) and want quality clues, you can just pull clues from these. If you do care about all those things, you can surface the list of clues used in the past to the human constructor and let them write the final clue. Or you can try to perfect LLM clue-writing. In my experience, LLMs are terrible at clues. Like sometimes if I try to give it feedback about a clue, it will just work the feedback into the clue… it’s a little hard to describe without an example, but basically it doesn’t seem to understand the requirements of a clue and the process of a solver looking at a clue and trying to come up with an answer.Coming up with an interlocking set of fun, high-quality words and phrases is the hard part. I agree that LLM wordlist curation is a great idea, and I started playing around with that once.Beyond that, I don’t think LLMs can help with grid construction, which is a more classic combinatorial problem.

评论 #42498725 未加载