Hi Puzzlers! Welcome back to another exciting installment of Natural Language Puzzling, where we use natural language processing to solve the NPR Sunday Puzzle. Let's look at this week's puzzle:
This week's challenge comes from listener Michael Schwartz, of Florence, Oregon. Take the name of a watercraft that contains an odd number of letters. Remove the middle letter and rearrange the remaining ones to name a body of water. What words are these?
It's a puzzle format as old as time: Take item from Class A (boat), apply transformation (remove letter and rearrange), return item from Class B (body of water). A potentially tricky part here is that these are both open classes---we'll never have a complete list of types of watercraft or bodies of water. New watercraft get invented, for example.
So how do we go about this? Here's what I need, how I'll get it and how I'll use it:
- C: a list of watercraft
- It's not clear whether this is a list of common nouns like raft and submarine, and/or specific craft, like Titanic
- After years of following the Sunday Puzzle, I suspect it's the former, so I'll start with that
- Between web search, Wikipedia, and asking ChatGPT, I cobbled together a list of 100+ different types of watercraft. I filtered this down to only those that are one word, because these puzzles tend to prefer that kind of simplicity. This left me with about 70 words, which I saved in my Python script
- W: a list of bodies of water
- Again, it's not clear if we want common nouns like lake and reservoir or proper nouns like Mediterranean Sea
- I am working under the assumption we want the simple common nouns, because the puzzles tend to work that way unless they specify otherwise. Also, the proper nouns will typically be more than one word, which would make this a bit messy and the puzzles tend to be cleaner than that.
- However, I'm not super confident about assembling this list, so I think instead of starting with a list, I'll use a language model to evaluate the candidate anagrams to see if they make sense in a sentence as a body of water
- L: lexicon, a list of valid English spellings (see "script" section below)
- LLM: Large language model
- We'll use a large language model to evaluate the probability (or perplexity) of sentences where the candidate words are evaluated in context as a body of water.
- I'll be using the GPT2 implementation in the Python transformers library, which is more than powerful enough for our needs here. I'll start with a sentence frame (or template):
- As the largest body of water in the area, the _____ is a popular summer destination.
- So for the watercraft steamer, we drop the middle letter to get stemer, and we find the anagrams meters, metres
- I get scores from the LLM for the resulting sentences:
- 3.91 As the largest body of water in the area, the metres is a popular summer destination.
- 3.87 As the largest body of water in the area, the meters is a popular summer destination.
- This is a perplexity score, so a lower score indicates a more likely sentence than a higher score. Of course, with only these two examples, we can't conclude much. I took these from a larger list (which I can't post here because spoilers), and in that context these scores could probably be taken to indicate that the words are real words and are not incoherent in the context, but an LLM wouldn't find them particularly likely.
- For comparison, I tried this with carburetor and lake:
- 3.67 As the largest body of water in the area, the carburetor is a popular summer destination.
- 2.94 As the largest body of water in the area, the lake is a popular summer destination.
- script:
- We need a script that can iterate through our list of watercraft and apply some functions:
- Keep only those with an odd number of letters
- Remove the middle letter
- Find all the anagrams
- We'll simply get all the recombined permutations of the letters, then check them against an English lexicon (L)
- Slot these candidate strings into the sentence frame and pass each sentence to the LLM for a score
- Keep all the scores below a reasonable perplexity threshold
From there, if I've done this correctly and set an appropriate perplexity threshold, I should get a manageable number of scored sentences back. This will be human in the loop (HITL) approach. That's me, I'm the human. I'll then review the results and ideally, the correct solution will have a relatively low perplexity score and it should be fairly easy to spot.
And in fact, that's exactly what I've done, and I got a solution that I'll be back to share with you after the Thursday deadline for submissions to NPR.
In the meantime, if you'd like to try my script or modify my approach, you can find it here. Make sure you have the torch and transformers libraries installed for Python, and you'll need to also download this lexicon file.