Natural Language Puzzling: February 2024

Monday, February 26, 2024

"Genie"

Hello Puzzler friends, let's dig into this week's Sunday Puzzle from NPR.

This week's challenge: This week's challenge comes to us from listener Eric Berlin of Milford, Connecticut. Take the word SETS. You can add a three-letter word to this twice to get a common phrase: SPARE PARTS. Can you now do this with the word GENIE, add a three-letter word to it twice to get a common phrase. Again, start with GENIE, insert a three-letter word twice, get a common phrase.

How are we feeling about this? Seems like a pretty straightforward Sunday puzzle:

~Take a ThingA, apply the given transformation, get a thing from Category B.

So what resources do we need to solve this?

N: A list of three letter words to iterate through

I suggest we start with wordlist of English words ranked by frequency, like the top 100k most frequent English words, then keep only the three letter words

A script that will:

Generate C, a list of all the combinations of two positions in the word GENIE, like

_G_ENIE, _GE_NIE, ... GE_N_IE, GE_NI_E
(Although the above is just an illustration and we'll probably want this as a list of integers representing indices in the string, like [(0,2), (0,3), ...])

Iterate through the list of positions, then:

iterate through the list of 3-letter words and insert them into these positions

iterate through the positions of the new string to insert a word break

pass the new string to a language model to return a score
we want something like a perplexity score, which tells us how unlikely our string is (i.e., how perplexed is our language model to see this string?)
I'll be using a pretrained GPT2 model from the Hugging Face transformers library for python.

print a list of possible solutions that score below the perplexity threshold for us to review manually.

This is a brute force approach and certain to take a lot of time. If N is our list of 3-letter words, and C is the list of combinations of 2 positions within the word "genie", and S is the list of positions where we can split the resulting string into two words, we end up with a time complexity something like:

len(N) * len(C) * len(S)

We know that the length of C is 15 and the length of S is 7, so we're looking at a lot of operations per candidate 3-letter word.

I've found my solution, so I'll be back after the Thursday submission deadline to share it. In the meantime, you can take a closer look at my solver script here, and see if you can get it running to output the solution.

Update: Okay, here's my solution:

See you next week!

Monday, February 19, 2024

American literature, American history

Another week, another puzzle!

Let's take a crack at this week's NPR Sunday Puzzle using natural language processing.

This week's challenge: It comes to us from listener Andrew Chaikin of San Francisco, also known as the singer Kid Beyond. Think of a famous character in American literature. Change each letter in that character's name to its position in the alphabet — A=1, B=2, etc. — to get a famous year in American history. Who is this person and what is the year?

Interesting! This one is a little different from the usual format of take a thing from class A, apply the given transformation, get a thing from class B. Well, it's kind of similar but we do this transformation from letters to numbers.

What do we need to solve this puzzle?

C: a list of famous characters from American literature

open class
We can search the web for an existing list
We can ask a chatbot for a list
We can think up our own
We can scrape one from Wikipedia

Y: a list of "famous years in American history"

semi-open class

what counts as a "famous year"?
presumably within range of 1492-2024

Again, we can search the web, ask a chatbot, think up our own

Presumably this is things like 1776, 1863, 1929, 2001...

p: a function to turn a letter into its position in the alphabet

Let's think about this for a minute. Y seems to be very restrictive. It can only begin with 1 or 2. In fact, we can limit it to even these first two digits: 14, 15, 16 ,17, 18, 19, 20.

This, in turn, limits the possible starting letters for the character's name. Take caution, however, because there will be multiple ways to tokenize a 4-digit year. For example, let's take 1863 (the year of the Emancipation Proclamation). We can tokenize the digits into any combination that yields numbers in the range of 1-26. So for 1863:

1, 8, 6, 3
18, 6, 3

Note that "1, 86, 3" isn't valid because there is no 86th position in the alphabet.

What other conclusions can we draw from Y?

Clearly the only character names in C that can match with a year inY will have a length of 2, 3 or 4 letters. A 4 letter name will only work if each of the 4 letters is in the first 9 letters of the alphabet (because each letter must translate to a single digit number). And 3 letter names could have only 1 letter occurring after letter 9. So the takeaway is this, the name will likely contain mostly letters that occur within the first 9 letters of the alphabet: abcdefghi. Also, I think we can assume that the name must be recognizable on its own; Jad or Deb aren't particularly meaningful names in the context of characters from American literature. We probably need a more iconic name like Huck (but no, Huck doesn't work --> 8,21,3,11).

Given how restrictive this puzzle is, we could probably come up with the answer fairly easily by just thinking through it and maybe searching the web a bit for inspiration regarding character names and significant years. But, this isn't that kind of blog! You can see my approach using python in this script. I'll be back after the Thursday submission deadline to share my solution.

Update: Now that the deadline for submissions has passed, I'll share my solution here:

Did you get it too?

Monday, February 12, 2024

Movie stars and the movies they star in

I have a little time on my hands lately so let's solve some puzzles!

*Update: For the solution to this puzzle, scroll to the bottom of this post.

This week's Sunday Puzzle from NPR:

This week's challenge: Is from our puzzler friend A.J. Jacobs. Start with the name of a blockbuster movie star. Remove the first letter of the first name and last two letters of the last name to get the types of movies he almost never stars in. Who is this?

This is a classic puzzle form: Apply the given transformation to one thing to get another thing. More formally, we could say that there exists a movie star (m) for which the function z returns a type (t).

Better yet, I think we could express this as:

∃ m ∈ M, ∃ t ∈ T : z(m) = t

Roughly:

There exists some m which is a member of M,
AND there exists some t which is a member of T,
Such that the function z applied to m results in t.

Of course z is already defined for us: Remove the first letter of the first name and last two letters of the last name.

M is our set of blockbuster movie star names, and "he" in the clue tells us this is a list of men.

T is our set of "types of movies".

How would you approach this puzzle?

As a computational linguist and veteran puzzler, here's my approach:

Get a cleaned up list of the top male movie actors (M)

Note that this is not a closed class, like "state capitals" or "Best Actor Oscar Winners", so we'll never be certain that our list is comprehensive and includes the correct candidate. In a case like this, I'll aim for a list of at least 100 candidates.
We could ask our favorite chatbot, but I found a list here we can copy and clean:

https://www.listchallenges.com/150-famous-male-actors

Get a cleaned up list of types of movies (T)

We could ask a chatbot or scrape this from Wikipedia.

Create a python script:

create string transformation function
iterate through M

apply transformation function
compare resulting string with T for a match

And that's essentially what I've done in this python script I've shared on GitHub. I've hard coded both lists (M and T) into the script, which you may want to copy and paste into your own python approach to the puzzle. Good luck, and let's check in after the Thursday submission deadline with the solution.

Natural Language Puzzling