Natural Language Puzzling

Happy Friday, Puzzlers! I'm later in the week than usual, but I've got a puzzle breakdown and solution for you. Let's look at this week's NPR Sunday Puzzle:

Yes, it comes from listener Ethan Kane (ph) of Albuquerque, N.M. Name a famous TV personality of the past. Drop the second letter of this person's last name, and phonetically, the first and last names together will sound like a creature of the past. What celebrity is this?
So again, a famous TV personality of the past. Drop the second letter of this person's last name, and say it out loud. The first and last names together will sound like a creature of the past. What celebrity is it?

This is a familiar format-- take item from Class A, apply a transformation and return an item from Class B.

In this case, we have an extra layer of representation to deal with--the speech sounds of the two items in question. In other words, we're dealing with both orthography and phonology.

So here's what we need to solve this puzzle:

T: a list of candidate names for the TV personality of the past
C: a list of candidate "creatures of the past"
pronunciation dictionary: we'll have the spelling of the personalities and creatures, but we need this resource to get the phonetic spellings so we can compare them
script, which will:

load the data (list of TV personalities, list of creatures, pronunciation dictionary)
split personalities and creatures into separate strings that we can query in the pron dictionary;

"Bob Hope" --> ["bob", "hope"]
"Saber-toothed tiger" --> ["saber", "toothed", "tiger"]
"Dodo" --> ["dodo"]

query these separate strings for pronunciations:

"bob" --> "B AA1 B"
"hope" --> "HH OW1 P"
"saber" --> "S EY1 B ER0"
"toothed" --> "T UW1 TH T"
"tiger" --> "T AY1 G ER0"

rejoin the pronunciation strings and normalize (remove numeral accent markers and ensure correct spaces):

["B AA1 B", "HH OW1 P"] --> "B AA B HH OW P"
["S EY1 B ER0", "T UW1 TH T", "T AY1 G ER0"] --> "S EY B ER T UW TH T T AY G ER"
Note: we do this because stress can be quite variable anyway, and we probably want to relax the constraint here.

I'll start with this approach, and revisit if necessary. For example, I can see that perhaps we'd want to reduce the double \T T\ to a single \T\ in the phonetic spelling of sabertooth tiger, as a contextual phonotactic rule would normally apply here in running speech anyway. So this is something to revisit if I'm striking out.

Iterate through the person pronunciations, then through the creature pronunciations, looking for a match and printing out any results.

I've done a bit of a "draw the rest of the owl" trick here, of course, because much of the challenge is simply coming up with T and C, our lists of candidate TV personalities and creatures. I tried using BERT in masking mode to fill in blanks like "The paleontologists discovered a rare complete [MASK] skeleton last month" in hopes of getting a list of suitable creatures, but I found it really difficult to tune these prompts in a way that generated good answers and not a lot of bad answers. We had success with BERT masked mode before, like in this puzzle, but I pivoted to use an LLM this time. Actually, for the creatures list, I found a few lists online and cobbled together a list of only about 30 creatures. For the list of TV personalities, I ran a few ChatGPT queries and eventually had a list of about 250 names.

You can see my work on GitHub; you'll also want my list of TV personalities (the list of creatures is much shorter and hard-coded). As these puzzles are always one-off solutions, I rarely optimize the script for efficiency--it's simply a race to get a solution. This script could benefit from some TLC (feel free to push your changes), but I'm happy we got a solution. The deadline for submissions has passed, so click below if you just want to see my answer. See you next week!

Natural Language Puzzling

Friday, September 06, 2024

A TV Personality of the past and a creature of the past

Director, anagram, film award

Report Abuse