This week's challenge: This week's challenge comes to us from listener Eric Berlin of Milford, Connecticut. Take the word SETS. You can add a three-letter word to this twice to get a common phrase: SPARE PARTS. Can you now do this with the word GENIE, add a three-letter word to it twice to get a common phrase. Again, start with GENIE, insert a three-letter word twice, get a common phrase.
How are we feeling about this? Seems like a pretty straightforward Sunday puzzle:
~Take a ThingA, apply the given transformation, get a thing from Category B.
So what resources do we need to solve this?
- N: A list of three letter words to iterate through
- I suggest we start with wordlist of English words ranked by frequency, like the top 100k most frequent English words, then keep only the three letter words
- A script that will:
- Generate C, a list of all the combinations of two positions in the word GENIE, like
- _G_ENIE, _GE_NIE, ... GE_N_IE, GE_NI_E
- (Although the above is just an illustration and we'll probably want this as a list of integers representing indices in the string, like [(0,2), (0,3), ...])
- Iterate through the list of positions, then:
- iterate through the list of 3-letter words and insert them into these positions
- iterate through the positions of the new string to insert a word break
- pass the new string to a language model to return a score
- we want something like a perplexity score, which tells us how unlikely our string is (i.e., how perplexed is our language model to see this string?)
- I'll be using a pretrained GPT2 model from the Hugging Face transformers library for python.
- print a list of possible solutions that score below the perplexity threshold for us to review manually.
See you next week!