Monday, February 26, 2024

"Genie"

Hello Puzzler friends, let's dig into this week's Sunday Puzzle from NPR.

This week's challenge: This week's challenge comes to us from listener Eric Berlin of Milford, Connecticut. Take the word SETS. You can add a three-letter word to this twice to get a common phrase: SPARE PARTS. Can you now do this with the word GENIE, add a three-letter word to it twice to get a common phrase. Again, start with GENIE, insert a three-letter word twice, get a common phrase.

 How are we feeling about this? Seems like a pretty straightforward Sunday puzzle:

~Take a ThingA, apply the given transformation, get a thing from Category B.

So what resources do we need to solve this?

  • N: A list of three letter words to iterate through
    • I suggest we start with wordlist of English words ranked by frequency, like the top 100k most frequent English words, then keep only the three letter words
  • A script that will:
    •  Generate C, a list of all the combinations of two positions in the word GENIE, like
      • _G_ENIE, _GE_NIE, ... GE_N_IE, GE_NI_E
      • (Although the above is just an illustration and we'll probably want this as a list of integers representing indices in the string, like [(0,2), (0,3), ...])
    • Iterate through the list of positions, then:
      • iterate through the list of 3-letter words and insert them into these positions
        • iterate through the positions of the new string to insert a word break
          • pass the new string to a language model to return a score
          • we want something like a perplexity score, which tells us how unlikely our string is (i.e., how perplexed is our language model to see this string?)
          • I'll be using a pretrained GPT2 model from the Hugging Face transformers library for python.
      • print a list of possible solutions that score below the perplexity threshold for us to review manually.

This is a brute force approach and certain to take a lot of time. If N is our list of 3-letter words,  and C  is the list of combinations of 2 positions within the word "genie", and S is the list of positions where we can split the resulting string into two words, we end up with a time complexity something like:

len(N) * len(C) * len(S)

We know that the length of C is 15 and the length of S is 7, so we're looking at a lot of operations per candidate 3-letter word.

I've found my solution, so I'll be back after the Thursday submission deadline to share it. In the meantime, you can take a closer look at my solver script here, and see if you can get it running to output the solution.

Update: Okay, here's my solution: 

See you next week!

No comments:

Post a Comment

Director, anagram, film award

Welcome back to Natural Language Puzzling, the blog where we use natural language processing and linguistics to solve the Sunday Puzzle from...