Natural Language Puzzling: November 2024

Sunday, November 24, 2024

State Capital, TV character, TV game show host

Happy Puzzle Day, friends! Let's take a crack at today's Sunday Puzzle from NPR:

This week's challenge comes from listener Greg VanMechelen, of Berkeley, Calif. Name a state capital. Inside it in consecutive letters is the first name of a popular TV character of the past. Remove that name, and the remaining letters in order will spell the first name of a popular TV game show host of the past. What is the capital and what are the names?

I'll break this puzzle down as usual. Here's what we need to solve it:

C: a list of state capitals

easy peasy--we love starting with a closed class (i.e., there is a fixed list and we don't need to worry about whether our list contains all the right candidates)

get_substrings(): We need some kind of a (python) function to iterate through each state capital string and apply the puzzle transformation

In this case this means extracting a substring, squishing the remaining letters together into a string, and returning the substring and the remaining string.
I suggest we do this in a way that allows each substring to be between 2 and 6 letters long. This is a somewhat arbitrary decision, but my intuition tells me that we're unlikely to find a sequence of 7 letters embedded in a state capital that happen to spell out a person's name and still allow the rest of the letters to spell another name. We can of course modify this limit as needed.

LM: We need a (large) language model to evaluate candidates.

I'm using a pretrained huggingface transformers library implementation of GPT2.
I'll iterate through the candidates, inserting them into a sentence template and getting a perplexity score from the LLM.
For example, we could imagine: Madison --> di, mason
And we use them in the sentence template: "Di was a popular TV character and Mason was a popular TV game show host."

Note that capitalization matters here. Without capitalizing the names, the correct solution is ranked quite poorly, but with capitalization in emerges among the top 8 candidates.

We return a list of these sentences ranked by perplexity scores, and theoretically, the correct solution should be among those with the lowest perplexity.

As I mentioned above, my approach did lead me to the solution. You can try my script here. Here's a look at some of the output (not including the solution).

(__1__ is a TV character and __2__ is a TV game show host)
3.54 | Jackson | Son | Jack | Son is a TV character and Jack is ...
3.69 | Annapolis | Anna | Polis | Anna is a TV character and Polis is ...
3.71 | Honolulu | Hono | Lulu | Hono is a TV character and Lulu is ...
3.80 | Salem | Le | Sam | Le is a TV character and Sam is ...
3.82 | Helena | He | Lena | He is a TV character and Lena is ...
3.86 | Littlerock | Ock | Littler | Ock is a TV character and Littler is ...
3.87 | Hartford | Hart | Ford | Hart is a TV character and Ford is ...

Good luck! I'll be back after NPR's Thursday deadline for submissions to share my solution.

Update & Solution

The deadline for submissions has passed, so click below if you want to see my answer. See you next week!

Sunday, November 03, 2024

Experiment place and person

It's Sunday Funday here on Natural Language Processing, where we use natural language processing and linguistics to solve the Sunday Puzzle from NPR! Let's look at this week's puzzle:

This week's challenge comes from listener Mark Maxwell-Smith. Name a place where experiments are done (two words). Drop the last letter of each word. The remaining letters, reading from left to right, will name someone famously associated with experiments. Who is it?

Classic Sunday Puzzle, right? Take thing from Class A, apply a string transformation, get thing from Class B.

Here's how we'll solve the puzzle:

P: (list) places where experiments are done (two words)

We need this list to iterate through in search of the solution
We can ask any available LLM to generate a list to get us started

I started with such a list, but broke it into a list of first words and second words, that my python script then recombines to get a longer list, so:

[science lab, physics room] -->
[science lab, science room, physics lab, physics room]

This is an open class, but the clue is so restrictive I think we can cover all the possibilities this way

S: (list) someone famously associated with experiments

Again, we can ask an LLM to provide us a list, but we can't be certain we'll cover the solution because this is a fairly vague clue and definitely a very open class
Alternatively, we can simply iterate through P, apply the transformation, and plug the new string into an LLM to get a score

transform(str): (function) We need to define a function that can take in a string like "science laboratory" and return the transformed string: "Scienclaborator"

Note that we'll capitalize the string as we're expecting a name

LLM: We'll use the python transformers library and its pretrained GPT2 model

We'll start with a sentence frame like this:

'[BLANK] was a famous scientist known for experiments.'

We insert each transformed string into the sentence frame, pass it to the LLM, and return a perplexity score
We rank these candidates by perplexity

The solution should be among the candidates with the lowest perplexity score.

And that's how I solved it! If you have alternative approaches, drop a note in the comments. I'll be back right here after NPR's Thursday deadline to share my solution. Good luck!

Update & Solution

The deadline for submissions has passed, so click below if you want to see my answer. Click here for the python script on GitHub. See you next week!

Natural Language Puzzling

Sunday, November 24, 2024

State Capital, TV character, TV game show host

Sunday, November 03, 2024

Experiment place and person

Director, anagram, film award

Report Abuse