This week's challenge comes from listener Greg VanMechelen, of Berkeley, Calif. Name a state capital. Inside it in consecutive letters is the first name of a popular TV character of the past. Remove that name, and the remaining letters in order will spell the first name of a popular TV game show host of the past. What is the capital and what are the names?
I'll break this puzzle down as usual. Here's what we need to solve it:
- C: a list of state capitals
- easy peasy--we love starting with a closed class (i.e., there is a fixed list and we don't need to worry about whether our list contains all the right candidates)
- get_substrings(): We need some kind of a (python) function to iterate through each state capital string and apply the puzzle transformation
- In this case this means extracting a substring, squishing the remaining letters together into a string, and returning the substring and the remaining string.
- I suggest we do this in a way that allows each substring to be between 2 and 6 letters long. This is a somewhat arbitrary decision, but my intuition tells me that we're unlikely to find a sequence of 7 letters embedded in a state capital that happen to spell out a person's name and still allow the rest of the letters to spell another name. We can of course modify this limit as needed.
- LM: We need a (large) language model to evaluate candidates.
- I'm using a pretrained huggingface transformers library implementation of GPT2.
- I'll iterate through the candidates, inserting them into a sentence template and getting a perplexity score from the LLM.
- For example, we could imagine: Madison --> di, mason
- And we use them in the sentence template: "Di was a popular TV character and Mason was a popular TV game show host."
- Note that capitalization matters here. Without capitalizing the names, the correct solution is ranked quite poorly, but with capitalization in emerges among the top 8 candidates.
- We return a list of these sentences ranked by perplexity scores, and theoretically, the correct solution should be among those with the lowest perplexity.
As I mentioned above, my approach did lead me to the solution. You can try my script here. Here's a look at some of the output (not including the solution).
- (__1__ is a TV character and __2__ is a TV game show host)
- 3.54 | Jackson | Son | Jack | Son is a TV character and Jack is ...
- 3.69 | Annapolis | Anna | Polis | Anna is a TV character and Polis is ...
- 3.71 | Honolulu | Hono | Lulu | Hono is a TV character and Lulu is ...
- 3.80 | Salem | Le | Sam | Le is a TV character and Sam is ...
- 3.82 | Helena | He | Lena | He is a TV character and Lena is ...
- 3.86 | Littlerock | Ock | Littler | Ock is a TV character and Littler is ...
- 3.87 | Hartford | Hart | Ford | Hart is a TV character and Ford is ...
Good luck! I'll be back after NPR's Thursday deadline for submissions to share my solution.
Update & Solution
The deadline for submissions has passed, so click below if you want to see my answer. See you next week!