Monday, April 21, 2025

Animal (5) --> bird (7) --> animal (9)

It's Monday Funday here at Natural Language Puzzling, the blog where we explore natural language processing (NLP), artificial intelligence (AI), language modeling and linguistics for the purposes of solving the weekly challenge from NPR's Sunday Puzzle. Welcome back! Let's take a crack at this week's challenge:

This week's challenge comes from Philip Goodman, of Binghamton, N.Y. Name an animal in five letters. Add two letters and rearrange the result to name a bird in seven letters. Then add two letters to that and rearrange the result to name another animal in nine letters. What creatures are these?

How should we approach this one? It basically follows the classic Sunday Puzzle pattern:

Take a thing from Class A and apply a transformation to yield a thing from Class B.

In this case, it's a little more complicated but really we're just adding a second transformation from Class B to Class C (which is similar to Class A, but the strings are nine letters instead of five).

As you can probably see, this one is relatively light on NLP, AI, and linguistics. We really just need good lists of birds and animals to start with, and it'll all be string and list manipulation from there.

Here's the approach I plan to take. I'll need these things:

  • A: A big ol'  list of animals
    • I've cobbled one together from lists on the web
    • You could also ask your favorite LLM chatbot
    • When we load the list to work with it in python, we'll filter it out into a list of five-letter words and another list of nine-letter words
  • B: A big ol' list of birds
    • Again, I found multiple different lists on the web and combined them, then removed duplicates, lowercased everything, etc.
    • We'll filter this one and keep only the birds with seven-letter names
  • check_letters: We need some kind of function to compare the letters in each of the strings in the above lists
    • We can brute force this as it's not super computationally heavy and it's a one-off solution anyway.
    • I'll take each five-letter animal (i.e., "for m in five_letter_animals:")
      • split it into a list of letters
      • take each (seven-letter) bird
        • split it into a list of letters
        • take each letter from the animal and try to remove it from the list of bird letters
        • if only two letters remain in the bird letter list, we keep this pair as a possible solution
    • Then we can more or less repeat this process, starting with the bird list and nine-letter animal list;
      • Again, if we can remove all the letters from the shorter word and we're left with two letters from the longer word, we have a possible solution.
      • We'll know the solution when we have a bird that is a possible solution in both of these comparisons
I'm happy to report that the approach above worked for me and I have the solution! I'll be keeping it (and my python script) to myself until after the Thursday 3pm ET deadline for submitting solutions to NPR, as I wouldn't want to get on host Will Shortz' naughty list for giving spoilers. But please check back after then if you'd like to see exactly how I solved it. Good luck, Puzzlers!


The deadline for NPR submissions has passed, so click the spoiler button below to see my solution, and click here to see my script.

Monday, April 14, 2025

European tourist site, planting medium

It's Monday, and you Puzzlers know what that means. Welcome back to Natural Language Puzzling, the blog where we use natural language programming (NLP), language models and tools and good old fashioned linguistics to solve the weekly Sunday Puzzle from NPR. Here is this week's puzzle:

This week's challenge comes from listener Jessica Popp, of Indiana, Pa. Name a famous European tourist site in nine letters. Rearrange its last four letters to name something that its first five letters can be planted in.

It's been a few weeks since we've had a "normal" puzzle like this one. It follows the pattern of many Sunday Puzzles: take thing from category A, apply a prescribed transformation, get thing from category B.

In these kinds of puzzles, we often start with lists: a list for Category A and/or a list for Category B. Then we need some kind of function to apply the transformation. Finally, we either check to see if the result is in our list for Category B; if we don't have a list for Category B, we use some kind of language model to evaluate each transformed candidate in an appropriate context and see how well it fits.

This time around, we'll need:

  • E: a list of European tourist sites
    • we can search the web for existing lists
    • we can ask a chatbot LLM to generate a list for us
    • we can try to cobble one together ourselves
    • we'll need to filter this list to only nine-letter entries
      • Note that LLMs are still not very good at returning words of specified length
  • P: a list of things that some unknown thing can be planted in
    • this one probably makes sense to ask chatbot LLM for
    • we could also query a non-chatbot LLM to fill in blanks for us
    • again, we'll need to filter this by length--for 4-letter words this time
  • functions/tools:
    • from my reading of the puzzle, the first five letters of the European tourist site should be something that can be planted; thus, the first five letters alone should be a common noun, so we could start with a function that takes the first five letters of each item in list E and checks to see if they make a valid word.
    • if we find a valid word, we can:
      • generate all the permutations of the last four letters
      • keep only those that are valid words
      • use a sentence template or templates with an LLM to see how likely the candidates are in context, e.g.:
        • I know the perfect [BLANK] to plant the new [BLANK] in.
      • In theory, the solution would be among the candidate pairs that result in the lowest perplexity score from the above sentence.
This is the approach I'll be taking. How would you do things differently? Share your thoughts in the comments. I'll be back with an update after the Thursday NPR submission deadline (no spoilers here!) to share my scripts and my solution (if I manage one). Good luck!

The deadline for NPR submissions has passed, so click the spoiler button below to see my solution.

Thursday, April 10, 2025

Milk and Beef

Hello Puzzlers. We're almost at this week's NPR submission deadline, but let's get a quick solution posted for posterity.

If you're new to the blog, welcome! Join us here each week as we use Natural Language Processing (NLP), language modeling and linguistics to solve the Sunday Puzzle from NPR. Here's this week's puzzle:

This week's challenge comes from listener Andrew Chaikin, of San Francisco. Think of an 11-letter word that might describe milk. Change one letter in it to an A, and say the result out loud. You'll get a hyphenated word that might describe beef. What is it?

For this puzzle, the solution pretty quickly jumped out at me, and maybe it did for you too. Regardless, let's walk through how we can use our tools and skills to get a solution. Here's what we need:

  •  M, a list of words that might describe milk
    • We can ask an LLM directly for this list, or
    • We can query an LLM to fill in blanks for us with the top candidates, e.g.:
      • "Some milk is more BLANK than other milk."
      • "Milk sold in stores must be BLANK."
    • (I asked an LLM to give me a list, and I got a list of about 90 words)
    • We'll need to filter this contain only 11-letter words (because LLMs are terrible about that)
  • B, a list of words that might describe beef
    • Ideally, we'll have a list for this side of the equation, too, then we'll check to see if we can apply the transformation to a word in M and match it to a word in B
    • Alternatively, we can simply take each candidate in M, then apply the transformation ("change one letter in it to an A and say the result out loud") and evaluate the result in the context where it is describing beef.
      • Of course this is tricky, because we're not matching strings, we're matching pronunciations; see the next point.
  • transform(): We need a function to apply the prescribed transformation: "change one letter in it to an A and say the result out loud"
    • How would we do this programmatically? We might need some guidelines to keep this manageable.
    • Firstly, I think we probably want to restrict our replacement targets to other vowels.
    • Secondly, the puzzle asks us to replace a letter with another letter (A), but we're only interested in the resulting pronunciation, not the resulting spelling. So we need to operate on pronunciations. We can use the CMU Pronouncing Dictionary as we have in past puzzles.
    • The CMU dictionary uses ARPAbet symbols, so we need a list of all the ARPAbet symbols that an "A" can correspond to. Then we can generate new pronunciations by replacing each vowel sound in the pronunciation one by one with these various "A" vowel sounds.
      • A few ARPAbet examples:
      • AE T : "at"
      • EY T : "ate"
      • TH EY T A : "theta"
    • Next we need a way to convert the pronunciation symbols back into a spelling ("phoneme-to-grapheme"). In our case, we'll probably simply want to query the CMU Dictionary to see if the new pronunciation exists in the dictionary, then we pull the corresponding spelling.
    • One more twist: Note that the puzzle specifies we'll get a hyphenated word. The hyphenated word is probably not going to have its own entry in the pronunciation dictionary. This means we'll need to try tokenizing the new pronunciation; in other words, we need to split it in two, then try finding each of the two pronunciations in the dictionary. We could simply brute force this, trying the split at each possible location (between syllables).  
  • evaluation: Let's imagine the above steps have gone well, and we've found a number of potential spellings for the "hyphenated word that might describe beef". Now we need a language model to evaluate each of the candidates in context. As we've done in the past, we can insert each candidate into a set of sentence templates and get a perplexity score for each, and ideally, the correct solution should have the lowest or near lowest perplexity. Our templates might be something like these:
    • "That was some of the best [BLANK] beef I have ever had."
    • "[BLANK] beef is expensive but usually high quality."
    • "The supermarket is having a special on [BLANK] beef this week."
That's how I'd go about approaching this one systematically. In truth, it would be overkill for this easier-than-usual puzzle. Alternatively, we could simply start with M (our list of milk words) and from there, most people would spot the solution right away. I went ahead and did just that; if you'd like to give it a shot, or even flesh out the full approach described above, you can start by running my script to get a list of 11-letter milk-related words. It only returns 10 words, so I'm confident you'll recognize the solution.


The deadline for NPR submissions has passed, so click the spoiler button below to see my solution.

Director, anagram, film award

Welcome back to Natural Language Puzzling, the blog where we use natural language processing and linguistics to solve the Sunday Puzzle from...