Natural Language Puzzling: First challenge: 12/20/2020: Christmas Wishes

Let's kick off this project by looking at the most recent challenge, from Sunday, December 20, 2020. The deadline for submitting solutions has already passed, but let's see if we can find one anyway.

This week's challenge: This week's challenge comes from listener Dan Pitt, of Palo Alto, Calif. Take the name BUENOS AIRES. Remove one letter. The remaining letters can be rearranged to name two things that many people wish for around this time of year. What are they?

Okay, so we're looking for two words, and these are things that people wish for around the Christmas (or Hanukkah?) season. I suppose these could be actual gifts like "toys" or "chocolates" or "socks", or they could be more abstract like "peace" for the new year, or "wealth", or "snow" for a white Christmas.

BUENOS AIRES has 11 letters, and we have to drop 1. So we need two words containing 10 letters total. I think it's safe to assume that neither word will be one letter or two letters, so our two words will be one of these combinations of letters:

Here's a rough sketch of the approach I'm thinking of:

Start with a list of candidate words, W;
Remove any word in W that cannot be spelled with BUENOSAIRES;
For each word wd_a in W, find any other word wd_b in W for which the length of wd_a + length of wd_b = 10; store pairs as list P
Remove any pair in P for which wd_a and wd_b cannot be spelled with BUENOSAIRES;
Print remaining pairs and find solution among them.

Sounds great, but... how do we get a list of candidate words in step 1 above? Similar challenges might give us more to go on here, such as "women's names" or "hobbies," for example. In this case, we only know that the two words are things wished for around Christmas time. We may be able to use a sentence embedding model like BERT to generate a list of possibilities.

BERT can be used in a "masking" mode. In this mode, the tool infers the most likely words to fill in a blank. The example given on the HuggingFace page is:

"Paris is the [MASK] of France."

The top prediction is "capital", but other predictions include "center", "heart", "city", etc.

Perhaps now you can see where I'm going with this. We can construct a few queries to collect our candidate words:

"I'm wishing for [MASK] this Christmas"
"Classic Christmas gifts like [MASK]"
"I was so happy when I got [MASK] for Christmas"
etc.

Then we can continue with our approach, filtering for length and spelling.

I've implemented all of this in a Python script. You can find it at my companion GitHub Repo for this blog, linked here.

The functions I've implemented there are annotated to describe what each does, so it should be fairly easy to follow. I admit it's not an optimal implementation-- I could probably simplify it or remove some redundant bits, but it's working and produces a solution in just a few seconds, so I'm leaving well enough alone.

The deadline has passed and the solution will air tomorrow, but if you want to work this out for yourself, here's your warning that I'll be spoiling the solution later on this page!

That is to say--my approach worked.

Here is the full set of queries I ran through the BERT masking model:

All I want for Christmas is [MASK]
All I want for Christmas is a [MASK]
[MASK] are selling out this Christmas
sold out of [MASK] this Christmas
I'm wishing for [MASK] this Christmas
I'm wishing for a [MASK] this Christmas
I got [MASK] for Christmas
I got a [MASK] for Christmas
Classic Christmas gifts like [MASK] and
[MASK] is the hottest Christmas gift
I wish for [MASK] this Christmas
asked Santa Claus for [MASK]
asked Santa for [MASK]
asked Santa Claus for a [MASK]
asked Santa for a [MASK]
best Christmas gift was [MASK]
Christmas gift was a [MASK]
gave me a [MASK] for Christmas
gave me [MASK] for Christmas
the gift of [MASK] this Christmas

I didn't investigate exactly which of these queries turned up the words in the solution, but doing so would be fairly simple by modifying the code a little.

In this task, we're mostly concerned with recall rather than precision; we don't mind turning up some bogus solutions so long as the correct solution is among them.

I found that by setting k, the number of predictions per query, to 100, I get exactly one solution--the correct one.

If I set k to 1500, I get the 17 pairs below. Most of the wrong answers here don't make sense, but I think beans and euros would be a pretty good Christmas gift!

Spoiler alert: The correct response is in this list.

['air', 'bonuses']
['aires', 'bones']
['aires', 'bonus']
['are', 'bonuses']
['ari', 'bonuses']
['beans', 'euros']
['beans', 'rosie']
['beau', 'sirens']
['bee', 'russian']
['been', 'russia']
['ben', 'serious']
['bone', 'raises']
['bone', 'russia']
['bones', 'raise']
['bonus', 'raise']
['bruises', 'neo']
['bruises', 'one']

Thanks for reading! See you for the next puzzle.

--Levi K

Natural Language Puzzling

Saturday, January 02, 2021

First challenge: 12/20/2020: Christmas Wishes

No comments:

Post a Comment

Director, anagram, film award

Report Abuse