This week's challenge comes from the crossword constructor and editor Peter Gordon. Think of a classic television actor — first and last names. Add a long-E sound at the end of each name and you'll get two things that are worn while sleeping. What are they?
Let's break this down into the tools and resources that we'll need:
- A: a list of classic television actors
- I'll start with this list of actors that we've compiled for previous puzzles
- It's not strictly classic television actors because it includes contemporary names as well as actors known primarily for movies, but that shouldn't be a problem for us.
- P: a pronunciation dictionary
- I'll be using this python implementation: https://github.com/Kyubyong/g2p
- It queries the CMU Pronouncing Dictionary and uses a grapheme to phoneme model to predict pronunciations for out of vocabulary (OOV) items
- Unfortunately, this implementation allows us to query a word to get its pronunciation but doesn't allow the reverse; i.e., it's g2p but not p2g. We'll also need to query a pronunciation to get an orthographic word. This is because we need to add "long-E" (/i/) to the end of the pronunciation to see if it results in a word that is a "thing worn while sleeping". So in this case we'll need to load the CMU dictionary file and work with it.
- **See my note below
S: list of things worn while sleepingIt seems like this should be a fairly short list, especially given that we're only looking for items that end in /i/ ("long E").I'm not so confident that I could brainstorm a comprehensive list, sowe'll use an LLM approach here instead.- LLM: We'll use the python transformers implementation of GPT2 to score sleepwear candidates.
The algorithm is this, roughly:
- We're searching for the actor name that gives us two items of sleepwear after we add /i/ to the end of the first and last names.
- So we take each actor, get the first name and last name, convert each to pronunciation, and add /i/ to the pronunciation:
- Bob Hope --> B AA B . HH OW P .
- B AA B . HH OW P . --> B AA B IY. HH OW P IY.
- Next, we use the pronunciation dictionary to try to map each pronunciation back to a word:
- B AA B IY . --> [ 'bobby', 'bobbie' ]
- HH OW P IY . --> [ 'hopi' ]
- Then we use an LLM to evaluate the probability of these sleepwear candidates in the context of a relevant sentence:
- "I usually sleep in my pajamas or my __1__, but last night I slept in my __2__."
- "I usually sleep in my pajamas or my bobby, but last night I slept in my hopi."=
- We sort all these candidate sentences by their LLM perplexity scores, and we expect to find the correct solution among those with the lowest perplexity score. Our "bobby" and "hopi" would likely have a high perplexity score, as it doesn't really make sense.
**Oddly enough, one of the sleepwear items is not present in the CMU Pronouncing Dictionary. This is odd because there are just over 135,000 unique word types there, and the word we are hunting is not particularly rare. In other words, I figured out the solution manually as I was attempting to solve this one, and when I checked the CMU dict, I found the word missing. I added the word and pronunciation to my own copy of the dictionary in order to make this approach work. This has me thinking that I should start maintaining my own addendum to the CMU Pronouncing Dictionary that I could share on my GitHub.
3.404 Rose Byrne rosie bernie I usually sleep in ...
3.655 Jack Black jackie blackie I usually sleep in ...
3.741 Paul Rudd pauli ruddy I usually sleep in ...
3.893 Will Smith willie smithee I usually sleep in ...
3.741 Paul Rudd pauli ruddy I usually sleep in ...
3.893 Will Smith willie smithee I usually sleep in ...
Update & Solution
The deadline for submissions has passed, so click below if you want to see my answer. See you next week!