Natural Language Puzzling: December 2024

Happy Monday funday, Puzzlers! Welcome back to Natural Language Puzzling, the blog where we use natural language processing, linguistics and data science to solve the weekly Sunday Puzzle from NPR. Let's take a look at this week's puzzle:

This week's challenge comes from the crossword constructor and editor Peter Gordon. Think of a classic television actor — first and last names. Add a long-E sound at the end of each name and you'll get two things that are worn while sleeping. What are they?

Let's break this down into the tools and resources that we'll need:

A: a list of classic television actors

I'll start with this list of actors that we've compiled for previous puzzles
It's not strictly classic television actors because it includes contemporary names as well as actors known primarily for movies, but that shouldn't be a problem for us.

P: a pronunciation dictionary

I'll be using this python implementation: https://github.com/Kyubyong/g2p
It queries the CMU Pronouncing Dictionary and uses a grapheme to phoneme model to predict pronunciations for out of vocabulary (OOV) items
Unfortunately, this implementation allows us to query a word to get its pronunciation but doesn't allow the reverse; i.e., it's g2p but not p2g. We'll also need to query a pronunciation to get an orthographic word. This is because we need to add "long-E" (/i/) to the end of the pronunciation to see if it results in a word that is a "thing worn while sleeping". So in this case we'll need to load the CMU dictionary file and work with it.
**See my note below

S: list of things worn while sleeping

~~It seems like this should be a fairly short list, especially given that we're only looking for items that end in /i/ ("long E").~~
~~I'm not so confident that I could brainstorm a comprehensive list, so~~ we'll use an LLM approach here instead.

LLM: We'll use the python transformers implementation of GPT2 to score sleepwear candidates.

The algorithm is this, roughly:

We're searching for the actor name that gives us two items of sleepwear after we add /i/ to the end of the first and last names.
So we take each actor, get the first name and last name, convert each to pronunciation, and add /i/ to the pronunciation:

Bob Hope --> B AA B . HH OW P .
B AA B . HH OW P . --> B AA B IY. HH OW P IY.

Next, we use the pronunciation dictionary to try to map each pronunciation back to a word:

B AA B IY . --> [ 'bobby', 'bobbie' ]
HH OW P IY . --> [ 'hopi' ]

Then we use an LLM to evaluate the probability of these sleepwear candidates in the context of a relevant sentence:

"I usually sleep in my pajamas or my __1__, but last night I slept in my __2__."
"I usually sleep in my pajamas or my bobby, but last night I slept in my hopi."=

We sort all these candidate sentences by their LLM perplexity scores, and we expect to find the correct solution among those with the lowest perplexity score. Our "bobby" and "hopi" would likely have a high perplexity score, as it doesn't really make sense.

Oh boy, this one took me several hours to implement, but I finally have an approach that produces the correct solution!

**Oddly enough, one of the sleepwear items is not present in the CMU Pronouncing Dictionary. This is odd because there are just over 135,000 unique word types there, and the word we are hunting is not particularly rare. In other words, I figured out the solution manually as I was attempting to solve this one, and when I checked the CMU dict, I found the word missing. I added the word and pronunciation to my own copy of the dictionary in order to make this approach work. This has me thinking that I should start maintaining my own addendum to the CMU Pronouncing Dictionary that I could share on my GitHub.

Anyway, here's what some of my output looks like, omitting the correct solution of course. I'm pleased to say that the correct solution has the lowest perplexity of all candidate solutions. I'll be back after the Thursday NPR deadline for submissions to share that solution with you. Good luck!

3.404 Rose Byrne rosie bernie I usually sleep in ...

3.655   Jack Black   jackie   blackie   I usually sleep in ...
3.741   Paul Rudd   pauli    ruddy   I usually sleep in ...
3.893   Will Smith   willie    smithee   I usually sleep in ...

Update & Solution

The deadline for submissions has passed, so click below if you want to see my answer. See you next week!

Natural Language Puzzling

Monday, December 02, 2024

Classic TV actor, sleepwear

Director, anagram, film award

Report Abuse