Natural Language Puzzling: February 2021

Friday, February 26, 2021

Philosophers and Foods (Solution)

Welcome back, Puzzlers! Whew--I solved it (with a little help)! Spoilers ahead!

Once again, here's the latest Sunday Puzzle from NPR:

This week's challenge comes from listener Andrew Chaikin, of San Francisco. Think of a famous philosopher — first and last names. Change one letter in the first name to get a popular dish. Drop two letters from the last name and rearrange the result to get the kind of cuisine of this dish. What is it?

In my preview post I described what we need to solve this puzzle, and some notes:

Lists:

P: a list of philosophers with first and last names;
D: a list of popular dishes;
C: a list of cuisines;

Functions:

letter_swap(string, alphabet):

for every letter l in string:

for every letter a in alphabet:

newstring = string.replace(l, a)

in other words, generate all possible newstrings where one letter is replaced

letter_drop(string, n):

find every combination of n positions within string and drop the letters in those positions
we'll use this to drop 2 letters from each philosopher's last name, then compare the results with C (the list of cuisines) to find a match;

is_anagram(x, y):

for strings x and y, convert to sorted strings:

e.g., "italian" --> "aaiilnt"
we'll use this to compare the output of letter_drop with C;

...

The trouble with the lists in this puzzle is that they are all open classes. That is, there is no complete list of philosophers (even famous ones), or popular dishes, or cuisines. We're often dealing with closed classes, like states of the USA or cities with an NFL team, etc. We can easily collect the complete list for closed classes. With these open class lists, however, it's often a matter of simply dumping as many possible candidates as we can find into a list. Remember, we're more interested in recall than precision; for example, if we're looking for dishes and we include beverages (or any non-dishes) in our list, that's preferable to having a strict list that might not include our target word or words.

P, the list of philosophers, is probably the easiest here. It's taking some work, but I've scraped over 1000 philosopher names from Wikipedia and other websites. From there, it's non-trivial but relatively straightforward to convert that list into something we can work with--first name and last name for each, and no punctuation or funky (non-ascii) letters. There's a little bit of work involved in handling some "two name" last names; e.g., for "hermann von helmholtz", I want my final list to include "hermann vonhelmholtz" and "hermann helmholtz".

D, dishes, is probably the hardest here. First, many dishes have multi-word names, and it's not clear how those should be handled. Consider this list:

nachos
fried chicken
shrimp scampi

What do we do with this? I think we just generate all reasonable possibilities (and then some), getting:

nachos
friedchicken
fried
chicken
shrimpscampi
shrimp
scampi

Heck, we might even try a quick and dirty approach to include singular and plural forms: if the word ends in an 's' try deleting it; if the word doesn't end in 's', try adding an 's'. Obviously this doesn't work perfectly: "-y" --> "-ies"; "panino" --> "panini", etc.

Finally, C, cuisines. This is possibly the messiest. What even is a cuisine? Basically any name for an ethnic group or nationality can be a cuisine. But might it also include words like vegetarian, barbecue, pastry? I'm not sure, but as usual, I think it's best to err on the least restrictive side.

I struggled with this one for a while. I tossed it to my regular trivia team and a couple of them looked over some philosophers and came up with the solution. So big shoutout to Hot Ham Water and especially Alyssa and Aaron! Once I had the solution, I discovered that I was indeed missing the target dish in my list of dishes. After adding the dish, I confirmed that my script works. Not the ideal outcome, but I'll take it.

I've posted my solver script to the naturallanguaglepuzzling GitHub repo. It has lots of comments to make clear what it's doing at each step.

So, after all that, what's the solution?

Well, as usual, my script overgenerates. Can you spot the solution among the script output?

These are supposed to be in the form:

philosopher name <--> dish, cuisine

SOLUTION: alain badiou <--> alaen, odia

SOLUTION: friedrich nietzsche <--> friedrice, chinese

SOLUTION: friedrich schelling <--> friedrice, english

SOLUTION: mark sainsbury <--> dark, russian

SOLUTION: mark sainsbury <--> mare, russian

SOLUTION: mark sainsbury <--> mars, russian

Looks like Russian mare is back on the menu, boys!

Thanks for stopping by, Puzzler. See you on the next Sunday Puzzle!

--Levi King

Tuesday, February 23, 2021

Philosophers and Foods (Preview)

Welcome, Puzzlers! Let's dig into the latest Sunday Puzzle from NPR:

This week's challenge comes from listener Andrew Chaikin, of San Francisco. Think of a famous philosopher — first and last names. Change one letter in the first name to get a popular dish. Drop two letters from the last name and rearrange the result to get the kind of cuisine of this dish. What is it?

Oh boy, this one is kind of complicated. In fact, I've been working on it off and on for two days and haven't found a solution. This may be the first puzzle all year to stump me! I'm determined to keep chipping away at it, however.

Let's look at what we need:

Lists:

P: a list of philosophers with first and last names;
D: a list of popular dishes;
C: a list of cuisines;

Functions:

letter_swap(string, alphabet):

for every letter l in string:

for every letter a in alphabet:

newstring = string.replace(l, a)

in other words, generate all possible newstrings where one letter is replaced

letter_drop(string, n):

find every combination of n positions within string and drop the letters in those positions
we'll use this to drop 2 letters from each philosopher's last name, then compare the results with C (the list of cuisines) to find a match;

is_anagram(x, y):

for strings x and y, convert to sorted strings:

e.g., "italian" --> "aaiilnt"
we'll use this to compare the output of letter_drop with C;

It doesn't look so bad, right?

I think I have working script... That is, when I add carefully crafted fake names to my list of philosophers, I get solutions, but I'm not finding any real solutions.

For example, I've added "Pavini Andoswitch" and "Falazel Koreegi", and I get "panini sandwich" and "falafel greek", so I think that some element is missing from one of my lists--a philosopher name, a cuisine or a dish.

D, dishes, is probably the hardest here. First, many dishes have multi-word names, and it's not clear how those should be handled. Consider this list:

nachos
fried chicken
shrimp scampi

What do we do with this? I think we just generate all reasonable possibilities (and then some), getting:

nachos
friedchicken
fried
chicken
shrimpscampi
shrimp
scampi

Well, good luck this week! I'll be back before Sunday to post a solution or hang my head in shame. :-)

--Levi

Friday, February 19, 2021

Actors and Books of the Bible (Solution)

Welcome back, Puzzlers. Spoilers ahead! It's time to solve the the Sunday Puzzle:

This week's challenge comes from listener Samuel Mace of Smyrna, Del. Name a famous actor whose first name is a book of the Bible and whose last name is an anagram of another book of the Bible. Who is it?

On Monday, I posted this breakdown:

Note one possible pitfall before we jump in: actor. This sometimes refers to men only (as opposed to actresses), but these days it is often used to include men and women. Let's use the less restrictive definition here. We know what can happen when we make assumptions about the puzzle.

Okay, to solve this we need:

Lists:

A: a list of actors;

open list; I'm using this list of 1000;

B: a list of books of the Bible;

closed list; I'm using this list from the standard King James Version; we could of course expand this by looking at other versions;

N: a subset of B, where all items are personal names;

Function:

is_anagram(x, y)

remove nonletters ("O'Neal" --> "ONeal")
lowercase ("ONeal" --> "oneal")
sort letters ("oneal" --> "aelno")
compare x, y
return True or False

The process from here is pretty straightforward. We iterate through A, checking each actor's first name to see if it appears in N. If we find a match, then we take the actor's last name and iterate through B, using the anagram function to look for a match.

And that's exactly what I did. I've posted my solver script to the companion GitHub repository for this blog. The script is annotated with comments to explain what each step is doing.

I guess it's time for the big reveal. Did you get a solution? Here's what I came up with:

John Hurt <--> John, Ruth

I would not be surprised if there are multiple solutions, but only one actor on my list of 1000 fit the bill.

Thanks for reading! I'll catch you on the next puzzle.

--Levi

Monday, February 15, 2021

Actors and Books of the Bible (Preview)

It's Monday Funday, so let's check out the latest Sunday Puzzle:

This week's challenge comes from listener Samuel Mace of Smyrna, Del. Name a famous actor whose first name is a book of the Bible and whose last name is an anagram of another book of the Bible. Who is it?

Great, we know how to handle this kind of puzzle!

Okay, to solve this we need:

Lists:

A: a list of actors;

open list; I'm using this list of 1000;

B: a list of books of the Bible;

closed list; I'm using this list from the standard King James Version; we could of course expand this by looking at other versions;

N: a subset of B, where all items are personal names;

Function:

is_anagram(x, y)

remove nonletters ("O'Neal" --> "ONeal")
lowercase ("ONeal" --> "oneal")
sort letters ("oneal" --> "aelno")
compare x, y
return True or False

Think you can handle it? I'll see you back here Friday with my solution!

--Levi

Friday, February 12, 2021

Names from U.S. history (Solution)

Welcome back, Puzzlers! Let's solve the latest Sunday Puzzle:

This week's challenge comes from listener Ed Pegg Jr., who runs mathpuzzle.com. Think of someone who has been in the news this year in a positive way. Say this person's first initial and last name out loud. It will sound like an important person in U.S. history. Who is it?

Plus, an additional clue--the "someone in the news this year" is "an official" of some kind.

I spelled out a hypothetical NLP approach in my preview post, but also mentioned that this one is pretty easy to solve mentally, provided that you follow the news somewhat closely. I suggested that we start with a list of important historical Americans, then filter out any names that don't start with a phoneme that sounds like an English letter (i.e., an initial).

I found a list of 100 such names from Smithsonian Magazine. Here's a mostly random subset I pulled; it does contain the target:

George Washington
John Muir
Sacagawea
Neil Armstrong
Michael Jordan
Martin Luther King Jr.
Ulysses S. Grant
Robert E. Lee
John Brown
Frederick Douglass
Henry Ford
Susan B. Anthony
Tecumseh
Thomas Jefferson
Franklin Delano Roosevelt
Oprah Winfrey
Al Capone
Frank Lloyd Wright
Andy Warhol
Brigham Young
Roger Williams
Mark Twain
Abraham Lincoln
Bob Dylan
Jimi Hendrix
Marilyn Monroe
Frank Sinatra
Louis Armstrong
John D. Rockefeller
Walt Disney
Bill Gates
Babe Ruth
Elvis Presley
Muhammad Ali
Jackie Robinson
Billie Jean King

If we eliminate any of those names that doesn't start off sounding like a letter, we're left with these five:

Ulysses S. Grant ("U")
Franklin Delano Roosevelt / "FDR" ("F")
Oprah Winfrey ("O")
Abraham Lincoln / Abe Lincoln ("A")
Elvis Presley ("L")

And then we can try to reason out what the corresponding name would be (an official making positive news this year; a first initial and last name):

Ulysses S. Grant ("U"):

~ U. Lissiesessgrant, U. Lissies Essgrant, U. Liss Izessgrant; I don't think so.

Franklin Delano Roosevelt / "FDR" ("F")

~ F. Diar, F. Dearre, F. DeRosa Velt; Doesn't ring any bells.

Oprah Winfrey ("O")

~ O. Prawinfrey, O. Prahwen Fris, O. Praw Infree, etc.; Nah.

Abraham Lincoln / Abe Lincoln ("A")

~ A. Braham Lincoln, A. Brah Hamlinken, A. Blinken; Hmm...

Elvis Presley ("L"):

L. Vispresly, L. Vispreslis, L. Visp Ressley, L. Vispress Lee, etc.; Nope.

Got it?

If not, take a look at President Biden's cabinet picks.

Thanks for reading! See for the next Sunday Puzzle!

--Levi King

Monday, February 08, 2021

Names from U.S. history (Preview)

Happy Monday, Puzzlers! Let's take a look at the latest Sunday Puzzle:

This week's challenge comes from listener Ed Pegg Jr., who runs mathpuzzle.com. Think of someone who has been in the news this year in a positive way. Say this person's first initial and last name out loud. It will sound like an important person in U.S. history. Who is it?

FYI, it's always a good idea to listen to the puzzle, too. In this case, Puzzlemaster Will Shortz dropped an important clue on the air that didn't make it into the write-up. While repeating the puzzle, he rephrased "someone who has been in the news this year" as "an official who has been in the news this year." Good to know.

Let's break this puzzle down a little.

We need:

O: a list of officials making positive news this year
H: a list of important persons in U.S. history

We could also use a dictionary of common U.S. name spellings and their pronunciations; i.e., orthographic spelling to phonemic spelling. That might be hard to come by, so a pronunciation model that can predict the phonemic spelling for previously unseen words (or in this case, names). This is commonly referred to as a grapheme-to-phoneme model, or g2p.

How do we come up with O, the list of officials making positive news this year? We're not likely to find such a list ready to use online. The thorough but hard way would be to use a web scraping tool like Beautiful Soup to hoover up loads of articles from a few non-paywalled news sites, making sure to limit it to articles from 2021. I'm just spitballing here, but I think next we would run a sentiment analysis tool. I'd probably use Stanza, the python version of the Stanford CoreNLP toolkit, but other good options are NLTK or TextBlob. These tools typically take some text and generate a score between -1.0 and 1.0 indicating where the sentiment lies on a continuum from very negative to very positive. We'd keep all the positive news articles. Then we'd run a named entity recognizer (NER) tool to extract a list of persons mentioned in the positive articles. In past puzzles, we've used the Stanza NER tool, which is easy to use out of the box. We could refine this a little, but that would probably do the trick to get us a decent list for O. We should go ahead and update these names to the form we need, with first initial and last name, e.g., Barack Obama --> B. Obama.

We could also try to scrape a list for H; if we had a lot of U.S. history text documents, we could just use NER to find names, and choose the top 100 (or 500, etc.) most frequent names. For this list, we can probably just find a list online. Here's a nice looking list of 100 from Smithsonian Magazine.

At this point, I would go ahead and apply whatever phonemic resource we're using to get a phonemic spelling (or spellings, if alternate pronunciations apply) for each name in H.

We also need a list of the pronunciations of the "names" of the 26 letters of the alphabet (actually, 25; we can skip the oddly named "w"). For example, in IPA:

Letter	IPA
a	eI
b	bi:
c	si:

From here, I would first make a quick pass through H and eliminate any name that starts with an IPA pronunciation that is not on our list of letter name pronunciations. For example, Babe Ruth would be eliminated here because the first phonemes in the pronunciation (beI) do not match anything in our list of letter names. Jane Addams, however, would not be eliminated, because the letter J and the name Jane both start with dzeI. (Of course this name would later fail, as there is no "J. Nadams" in our list O.)

From there, I'd probably just have the script print out all the remaining names. From just skimming the list manually, this would only leave about 15 of the 100 names we started with. That's surely a small enough number that we can read through them out loud to see if anything clicks.

But let's assume for some reason we really want to get the solution from our script. Instead of just printing out the remaining names in H, we would use our phonemic resources to generate pronunciations for all the names in O, allowing for multiple alternate pronunciations when needed. Then we simply iterate through H, looking for any matches in O.

I'm currently working on installing and familiarizing myself with some phonemic tools and models, and I promise I'll go all the way on such an approach in the near future when we get a puzzle with an interesting pronunciation aspect. I think it's overkill here, so I'm going to hold off this time.

In fact, this week's puzzle is fairly easy, so long as you consume a decent amount news and current events. I solved it by taking the first five or ten important persons from U.S. history that came to mind, then just mentally applying the same logic above with my Babe Ruth / Jane Addams examples above until I came to a match. I'm sure you can do the same!

Good luck, and I'll see you Friday with a solution!

-- Levi King

Friday, February 05, 2021

State borders and scrambled spelling (Solution)

Happy Friday, Puzzlers! It's time to solve the puzzle. Spoilers ahead.

Let's take another look at the current Sunday Puzzle:

This week's challenge comes from listener Derrick Niederman, of Charleston, S.C. Starting in Montana, you can drive into South Dakota and then into Iowa. Those three states have the postal abbreviations MT, SD, and IA — whose letters can be rearranged to spell AMIDST. The challenge is to do this with four connected states to make an eight-letter word. That is, start in a certain state, drive to another, then another, and then another. Take the postal abbreviations of the four states you visit, mix the letters up, and use them to spell a common eight-letter word. Derrick and I know of only one answer. Can you do this?

As I mentioned in the preview, we need two lists:

B: a list of state borders; I used this one;
E: an English lexicon; I used this one;

As always, I've posted my solver script to GitHub. The script includes some notes throughout to help explain what it does.

First, my script reduces E to O (for "octo-"?), a list containing only the 8-letter words from the lexicon. E contains 370,103 words. O contains 51,627 words.

Next, it iterates through B (for borders) and produces a new list Q (for "quads"?) of all the sets of four US states that one can drive through consecutively, as described above. There are 967 such four state paths. I think we need at least 2 vowels to spell an 8 letter word, so my script eliminates the four state paths with 0 or 1 vowels, leaving 901 paths.

Finally, it compares every item in Q with every item in O to see if they contain the same 8 letters.

Remember that the host said he was aware of only one solution. Well, my approach found 67 matches! However, some of these are very rare word forms and others are proper nouns---there's a lot of stuff in that lexicon file. That said, I found at least 15 good solutions among the 67. Here they are:

diamonds, mo-ia-sd-nd;

ornament, ar-tn-mo-ne;

nominate, ia-ne-mo-tn;

animator, ar-tn-mo-ia;

condemns, co-ne-sd-mn;

flagrant, ar-tn-ga-fl;

dioramas, ar-mo-ia-sd;

moleskin, il-mo-ne-ks;

eduction, id-ut-co-ne;

nonmetal, al-tn-mo-ne;

ransomed, ar-mo-ne-sd;

tangrams, ar-ms-tn-ga;

moralism, il-mo-ar-ms;

magneton, ga-tn-mo-ne;

nomadism, mn-sd-ia-mo;

I think we can consider this one a success.

Thanks for reading. I'll see you back here for the next Sunday Puzzle!

--Levi

Monday, February 01, 2021

State borders and scrambled spelling (Preview)

Welcome back, Puzzlers! Let's take a look at the new Sunday Puzzle:

This week's challenge comes from listener Derrick Niederman, of Charleston, S.C. Starting in Montana, you can drive into South Dakota and then into Iowa. Those three states have the postal abbreviations MT, SD, and IA — whose letters can be rearranged to spell AMIDST. The challenge is to do this with four connected states to make an eight-letter word. That is, start in a certain state, drive to another, then another, and then another. Take the postal abbreviations of the four states you visit, mix the letters up, and use them to spell a common eight-letter word. Derrick and I know of only one answer. Can you do this?

On the surface, this looks complicated, but let's break it down and do some brainstorming.

What resources do we need here?

B: a list of state borders (preferably using two-letter postal abbreviations)
E: an English lexicon

That's it, really. This puzzle won't really involve much NLP, just some string manipulation. Assuming we have a list B (for borders) of state borders, we need a bit of scripting to iterate through that list and produce a new list Q (for "quads"?) of all the sets of four US states that one can drive through consecutively, as described above.

Next, we can can reduce E to O (for "octo-"?), a list containing only the 8-letter words from the lexicon.

Finally, we iterate through Q, comparing the 8 letters in the state abbreviations to the 8 letters in each word, until we find a match. This is sure to be a lot of computation, but off the top of my head I don't have a guess for the length of Q or O. We may want to think of some ways to prune here. For example, if I recall correctly, there is no letter "q" in any state abbreviation, so we can remove any words with "q" from O. We could also remove from Q any set that does not contain at least 1 vowel. Or maybe 2 vowels?

Where do we find B, our list of borders, by the way? Naturally, I checked Wikipedia, but I didn't quite find what I wanted. Web search turned up this page, however, which lists the states and their bordering states.

And where do we find E, an English lexicon? I found a GitHub repo hosting multiple versions of English lexicons; the documentation there notes that this version contains only words that have only letters, no numbers or symbols, so I'll definitely be using that one.

Good luck this week! I'll see you back here for the solution.

--Levi

Natural Language Puzzling

Friday, February 26, 2021

Philosophers and Foods (Solution)

Tuesday, February 23, 2021

Philosophers and Foods (Preview)

Friday, February 19, 2021

Actors and Books of the Bible (Solution)

Monday, February 15, 2021

Actors and Books of the Bible (Preview)

Friday, February 12, 2021

Names from U.S. history (Solution)

Monday, February 08, 2021

Names from U.S. history (Preview)

Friday, February 05, 2021

State borders and scrambled spelling (Solution)

Monday, February 01, 2021

State borders and scrambled spelling (Preview)

Director, anagram, film award

Report Abuse