Monday, April 22, 2024

Major American corporation of the past

Welcome back, Puzzlers! And welcome back to  host Will Shortz!

Let's take a look at this week's challenge from the NPR Sunday Puzzle:

This week's challenge comes from listener Jim Vespe, of Mamaroneck, N.Y. Think of a a major American corporation of the past (two words, 15 letters altogether). Change the last three letters in the second word and the resulting phrase will name something that will occur later this year. What is it?

Oof, this one feels like a total freebee, and not to brag, but I got this right away after considering just a few possibilities.

The major American corporation of the past doesn't narrow things down much, but knowing that it must be two words, 15 letters altogether would definitely help us filter down a list if we had one handy.

The something that will occur later this year feels like the best place to start, because I can't think of a lot of possibilities. I don't think it would be something that occurs every year.

That leaves special events for 2024. It's safe to assume that this would be something that your average NPR listener would be aware of---not some obscure event or gathering.

The next World Cup is in 2026, so that's not relevant. Likewise, we already had a major eclipse.

The Summer Olympics takes place this year in Paris. This is a major election year in the USA.  Can you think of variations on one of these that might give us a solution? How about other major, newsworthy events happening this year?

I ran this challenge by Microsoft Copilot (using GPT-3.5, I believe), and while it made some wrong assumptions, it got close enough that it would lead a reasonable person to the correct answer. I also passed it to Google Bard Gemini, but the results were nowhere near the correct answer.

I'm going to pass on writing a full script for solving this one since it's so easy, but let's walk through a hypothetical NLP approach here.

This is a pretty standard word puzzle format for the Sunday Puzzle.

There exists some string (2 words, 15 letters) among List A (American corporations of the past) such that when the last 3 letters are changed, the resulting string can be found among List B (something that will occur later this year). In other words, take a thing from List A, transform it, get a thing from List B.

In an NLP approach, the main challenge would be gathering List A and List B and ensuring that they are reasonably complete. These are both open classes, meaning there is no complete list, because we could always keep digging and find something else. This is in contrast to closed classes, which are things like U.S States or Academy Award Best Picture winners.

I would suggest that we search the web for a ready-made list of American Corporations, and/or ask an LLM to provide us a list. Alternatively, as we've done in the past, we can use a pretrained (L)LM like BERT or GPT-2 (both available in python libraries) to fill in a blank for us with the 100 or 200 most likely candidates: "In the past, major American corporations like ________ provided employees with a wide range of benefits." We know these items need to be 2 words, 15 letters total, so we remove any items that don't fit that pattern. A little cleaning and formatting and we have List A.

For List B, I would turn to the web again. Possibly we could do a web search and find a list of major events taking place this year. We could ask Copilot. We could also just brainstorm our own list (but in doing so the solution would probably be so obvious we don't need an NLP approach). Let's assume we manage to find or assemble a list of plausible candidates. Again, we'll want to filter our list to only include strings that give us 2 words, 15 letters total.

The next element is the transformation--change the last three letters in the second word. In a brute force approach, we could indeed iterate through all permutations of 3 letters, but wouldn't this be unnecessary at this stage? Instead, we should iterate through List A, then iterate through List B, looking for a string match between ItemA[:-3] and ItemB[:-3]. In other words, we're comparing the strings but ignoring the final 3 letters. If we find such a match, our python script would print it out, at which point we can see how the 3 letters were changed.

I suspect we'd only find one solution, but let's take this a step farther. Let's assume, for the sake of challenge, that this would result in thousands of potential solutions--too many to review manually. How could we sort through those potential solutions?

Again, I would suggest we use a language model here. This time, instead of asking the model to fill in a blank for us, we're going to have the script fill in a blank with each of our potential solutions and then rank them according to a probability (or perplexity) score. In fact, we used such an approach just last week. A good sentence template to use here might be: "Many people are excited because the ________ will take place later this year." So our script will use the remaining items from List B and tell us how well each one fits in the context of our template.

What do you think of this hypothetical approach? Did you also find this puzzle easier than usual? I'll be back after the Thursday submission deadline to share my solution.

--Levi King

Update

Here's my solution: 


See you next week!

Monday, April 15, 2024

Online service

Let's see what natural language processing can do for this week's Sunday Puzzle from NPR:

This week's challenge: This week's challenge comes from Bruce DeViller, of Brookfield, Ill. Think of a popular online service. Change the first letter to a Y and rearrange the result to get what this service provides. What is it?


Sunday Puzzle regulars will recognize this familiar puzzle formula:

  1. take item from Class A
  2. apply transformation as described
  3. get item from Class B

Another thing to note here is that for this puzzle, Class A and Class B are going to be open classes, meaning we'll never have a complete list because we could always think of more. This is in contrast with a closed class, like US state capitals or Academy Award Best Picture winners.

So how can we use NLP to solve this problem?

  • First we need a list of online services (our Class A). I think it's a safe bet to assume that the solution will be within a list of the top 50 or 100 online services.
    • We might get lucky and find a readymade list on the web, so first I'll simply search the web for "list of top 100 online companies" or similar.
    • We could also ask Copilot or ChatGPT to give us a list of the most popular online services and see what we get.
    • We could return to an old favorite and use the huggingface python implementation of BERT or a similar language model and ask it to return the top 100 results for filling in the blank in a sentence like "Online services like _____ are popular among customers around the country."
  • Assuming we have the list of online services, we'll hard code it into our python script (since it's only 100 items or fewer). We'll create a function in the python script that can iterate through the list and change the first letter to a Y.
    • I'll have the script print these out for us to review visually, and if anything pops out, we may have the solution already. I think this step is necessary because we don't know from the puzzle whether "what this service provides" will be a single word or possibly tokenized into multiple words, and this makes automating the situation a bit more complicated.
    • But let's say we want to brute force our way through this with an automated approach. In this case, we'll take all our candidates, like ("amazon" -->) "ymazon". Let's first assume the "what this service provides" string is going to be one or two words; we can expand to three or four words later if needed. We can use the python itertools library to generate all the sequences (permutations) of y, m, a, z, o, n. For each sequence, we can also iterate through each position to insert a space and tokenize this into two strings. 
      • Now our script has a big, ugly list of (mostly gibberish) strings. We'll use an old favorite here to reduce the load: we can use nltk.corpus.words to toss out any candidate that contains a string that isn't a valid English word.
      • And now we can circle back to BERT and use a loss function to score the candidates in a template sentence. For example, we'll plug "amazon" and "many oz" into a template like "The online service amazon is popular for providing many oz to customers around the country".
      • We can sort these candidate sentences from most probable to least probable (or from lowest perplexity to highest perplexity). Then we instruct our script to print them out in this order, and if all goes well, our solution should be among the first entries in that ranked list.
Good luck! If you're stuck, take a look at my script. I'll be back to share my solution later in the week!


April 19, 2024 Update

The Thursday deadline has passed, so here's my solution: 


Here's the output of my script, showing only one-word solutions, and sorted from most to least likely, per the pretrained GPT2 model in the transformers python library:


See you next week!

Wednesday, April 10, 2024

A tool from the Bible

I've been away for a few days and it's already Wednesday so let's tackle this week's Sunday Puzzle from NPR:

This week's challenge comes from listener Steve Baggish of Arlington, Massachusetts. Think of a nine-letter word naming a kind of tool that is mentioned in the Bible. Remove the second and sixth letters and the remaining letters can be rearranged to spell two new words that are included in a well known biblical passage and are related to the area in which the tool is used. What are the three words?

Curious! Let's break this down a little:
  • a nine-letter tool from the Bible
    • seems like a very specific thing, like a word we find in the Bible but rarely elsewhere
    • It may surprise some people to know that we linguists tend to know a lot about the Bible. A lot of early linguistics was motivated by religious groups' desires to translate the Bible into as many languages as possible to reach potential converts. As natural language processing (NLP) developed, the Bible proved to be a good resource for building translation models, because the clear numbering of book and verse means it's easy to align the sentences across languages and use statistical methods to determine the which words and morphemes correspond. In these days of web-scale data, the Bible would be considered a tiny dataset for testing or training, but it served as a starting point for a lot of tasks in the history of NLP. And that brings me to some relevant stats here.
      • word tokens (total word count) in the Bible: 930,243
      • word types (unique words) in the Bible: 14,564
      • ~15,000 word types is really not that many to sort through. We only want those that are 9 letters or more, and that's likely to eliminate 80% or more. If even 2,000 words are left, that's very manageable for a variety of approaches we might take here.
      • We could even use a simple language model to get perplexity scores for each 9-letter word to see if it makes sense in the context of a tool. For example, how well would it fit in this sentence?
      • 'The _______ improved worker productivity in ancient times.'
  • remove the second and sixth letters; rearrange remaining letters to spell two new words
    • Easy-peasy. This is a straightforward task and we can write a python function to do this and generate all the possible character sequences broken into two words.
  • two new words that are included in a well known biblical passage and are related to the area in which the tool is used
    • This part is a bit tricky and doesn't give us much to go on semantically. I'm thinking we could use our python function to generate all the candidates, then: 
      • First, ensure that the two words are indeed English words and not gibberish 
      • Next, check whether there exists some verse in the Bible that contains both words.
      • Finally, I think we'll need to mentally confirm that the passage somehow relates to the area in which the tool is used.
Ultimately, I skipped the final stepped and handled it mentally. You can see the script I used here, but note you'll also need to install the pythonbible library and download this text file i created to index all the verses. First I simply pulled every unique 9-letter word from the Bible, which was about 1000. Then I plugged them into sentences where the context calls for a tool, ran them through the BERT language model, and printed them out from most to least likely. This helped me to manually filter out any non-tools from the list, leaving only about a dozen words. From there, I just applied a little brain power to see if I could rearrange the letters as described.

April 12, 2024 Update

The Thursday deadline has passed, so here's my solution: 


See you next week!

Monday, April 01, 2024

Waiter and Waitress, etc.

Welcome back, Natural Language Puzzlers! Here is this week's Sunday Puzzle from NPR:

This week's challenge: In honor of women's history month, all our challenge contributors in March have been women. To close out the month, I have this related challenge. 

The English language developed in a patriarchal society, so many words in our language were traditionally assumed to be male, and turned into female versions by adding a prefix or suffix. Waiter and waitress, comedian and comedienne — those are just two examples of the many stereotypically "male" words that become new "female words" by adding a suffix.

There is a common English word that works the opposite way. What is the common English word that is generally used to refer exclusively to women, but which becomes male when a two-letter suffix is added?

This one is more like a trivia question than a puzzle, unfortunately, and if you speak English and spend more than three seconds thinking about this you're probably going to get it. The solution came to mind readily; it seems this week the challenge will be constructing an NLP approach that can produce the same solution.

In fact, any of the big chatbots available to us might have a good chance at solving this one, given that it's relatively straightforward and a question of knowledge rather than any kind of complicated puzzle processing. Let's try it.

Chatbot Showdown!

Microsoft Copilot:


Screenshot of Copilot getting the correct solution

I've redacted the solution for now, but Copilot got this correct. It also added some nonsense about transforming "spinster" into "bachelor", but we'll just ignore that part.


Google Bard Gemini (or whatever they're calling it this week):

What is it about chatbots and "spinsters"?


In keeping with recent trends, a total failure here by Google Gemini.

Score: Microsoft Copilot 1, Google Gemini 0.


But let's imagine an approach to get the answer without a chatbot...

I think we could back off to a less sophisticated but still very powerful language model, like a huggingface BERT model or even the smaller GPT-2. In this case, I think we could get pairs of these "male" and "female" words by creating some sentence templates and asking the model to fill them in. For example:

  • In this industry, a male worker is referred to as a ______ and a female worker is referred to as a ______.
  • I don't think society should treat a male ______ any differently than a female ______.
  • Sometimes there are different versions of a word, like ______ refers to a man and ______ refers to a woman.
For each sentence, we take the k most probable word pairs according to the model. The puzzle tells us we add a two-letter suffix to the female word to get the male word, so we can simply iterate through our list of word pairs and see if they meet our conditions:

for pair in word_pairs:
    if pair[0] == pair[1][:-2] and len(pair[1]) == len(pair[0]) - 2:
        print("SOLUTION: ", pair)

Let's hope for a "real" puzzle next week! In the meantime, I'll be back to share my solution after the Thursday deadline.

April 6, 2024 Update

The Thursday deadline has passed, so here's my solution: 


See you next week!

Director, anagram, film award

Welcome back to Natural Language Puzzling, the blog where we use natural language processing and linguistics to solve the Sunday Puzzle from...