This week's challenge: This week's challenge comes to us from listener Eric Berlin of Milford, Connecticut. Take the word SETS. You can add a three-letter word to this twice to get a common phrase: SPARE PARTS. Can you now do this with the word GENIE, add a three-letter word to it twice to get a common phrase. Again, start with GENIE, insert a three-letter word twice, get a common phrase.
How are we feeling about this? Seems like a pretty straightforward Sunday puzzle:
~Take a ThingA, apply the given transformation, get a thing from Category B.
So what resources do we need to solve this?
- N: A list of three letter words to iterate through
- I suggest we start with wordlist of English words ranked by frequency, like the top 100k most frequent English words, then keep only the three letter words
- A script that will:
- Generate C, a list of all the combinations of two positions in the word GENIE, like
- _G_ENIE, _GE_NIE, ... GE_N_IE, GE_NI_E
- (Although the above is just an illustration and we'll probably want this as a list of integers representing indices in the string, like [(0,2), (0,3), ...])
- Iterate through the list of positions, then:
- iterate through the list of 3-letter words and insert them into these positions
- iterate through the positions of the new string to insert a word break
- pass the new string to a language model to return a score
- we want something like a perplexity score, which tells us how unlikely our string is (i.e., how perplexed is our language model to see this string?)
- I'll be using a pretrained GPT2 model from the Hugging Face transformers library for python.
- print a list of possible solutions that score below the perplexity threshold for us to review manually.
This is a brute force approach and certain to take a lot of time. If N is our list of 3-letter words, and C is the list of combinations of 2 positions within the word "genie", and S is the list of positions where we can split the resulting string into two words, we end up with a time complexity something like:
len(N) * len(C) * len(S)
We know that the length of C is 15 and the length of S is 7, so we're looking at a lot of operations per candidate 3-letter word.
I've found my solution, so I'll be back after the Thursday submission deadline to share it. In the meantime, you can take a closer look at my solver script here, and see if you can get it running to output the solution.
Update: Okay, here's my solution:
See you next week!
No comments:
Post a Comment