Monday, April 15, 2024

Online service

Let's see what natural language processing can do for this week's Sunday Puzzle from NPR:

This week's challenge: This week's challenge comes from Bruce DeViller, of Brookfield, Ill. Think of a popular online service. Change the first letter to a Y and rearrange the result to get what this service provides. What is it?


Sunday Puzzle regulars will recognize this familiar puzzle formula:

  1. take item from Class A
  2. apply transformation as described
  3. get item from Class B

Another thing to note here is that for this puzzle, Class A and Class B are going to be open classes, meaning we'll never have a complete list because we could always think of more. This is in contrast with a closed class, like US state capitals or Academy Award Best Picture winners.

So how can we use NLP to solve this problem?

  • First we need a list of online services (our Class A). I think it's a safe bet to assume that the solution will be within a list of the top 50 or 100 online services.
    • We might get lucky and find a readymade list on the web, so first I'll simply search the web for "list of top 100 online companies" or similar.
    • We could also ask Copilot or ChatGPT to give us a list of the most popular online services and see what we get.
    • We could return to an old favorite and use the huggingface python implementation of BERT or a similar language model and ask it to return the top 100 results for filling in the blank in a sentence like "Online services like _____ are popular among customers around the country."
  • Assuming we have the list of online services, we'll hard code it into our python script (since it's only 100 items or fewer). We'll create a function in the python script that can iterate through the list and change the first letter to a Y.
    • I'll have the script print these out for us to review visually, and if anything pops out, we may have the solution already. I think this step is necessary because we don't know from the puzzle whether "what this service provides" will be a single word or possibly tokenized into multiple words, and this makes automating the situation a bit more complicated.
    • But let's say we want to brute force our way through this with an automated approach. In this case, we'll take all our candidates, like ("amazon" -->) "ymazon". Let's first assume the "what this service provides" string is going to be one or two words; we can expand to three or four words later if needed. We can use the python itertools library to generate all the sequences (permutations) of y, m, a, z, o, n. For each sequence, we can also iterate through each position to insert a space and tokenize this into two strings. 
      • Now our script has a big, ugly list of (mostly gibberish) strings. We'll use an old favorite here to reduce the load: we can use nltk.corpus.words to toss out any candidate that contains a string that isn't a valid English word.
      • And now we can circle back to BERT and use a loss function to score the candidates in a template sentence. For example, we'll plug "amazon" and "many oz" into a template like "The online service amazon is popular for providing many oz to customers around the country".
      • We can sort these candidate sentences from most probable to least probable (or from lowest perplexity to highest perplexity). Then we instruct our script to print them out in this order, and if all goes well, our solution should be among the first entries in that ranked list.
Good luck! If you're stuck, take a look at my script. I'll be back to share my solution later in the week!


April 19, 2024 Update

The Thursday deadline has passed, so here's my solution: 


Here's the output of my script, showing only one-word solutions, and sorted from most to least likely, per the pretrained GPT2 model in the transformers python library:


See you next week!

No comments:

Post a Comment

Director, anagram, film award

Welcome back to Natural Language Puzzling, the blog where we use natural language processing and linguistics to solve the Sunday Puzzle from...