This week's challenge comes from listener Mike Selinker, of Renton, Wash. Think of something to drink whose name is a compound word. Delete the first letter of the first part and you'll get some athletes. Delete the first letter of the second part and you'll get where these athletes compete. What words are these?
This one is not terribly different from last week's puzzle. Here's how I plan to approach this problem:
- B: We need a sizable list of beverages
- I've pieced together a list of about 200 beverages including cocktails, coffee and tea drinks, etc.
- We know the solution should be a single string and it should be a compound word (e.g., "houseboat", "sunshine", etc.)
- After we drop a letter from each part of the compound word, we'll still have a word for athletes and a word for a place where they compete
- These two strings probably need to be a minimum of 3 letters each
- So we should start with a beverage word that is at least 8 letters long
- compound word function: Ideally, we need a function or a tool that can tell us whether a string is a compound word, and if so, what its component words are
- I haven't found a readily available solution for this, so I effectively had to create my own
- splitting function: To create our own compound word checker, we'll need to split each beverage word in two, such that each of the two strings contains a minimum of 3 letters; we can do this in python
- Lexicon: With each beverage split into a pair or multiple pairs, we need to check whether each half of the split is a real English word.
- There are many ways to approach this and numerous wordlists we could choose from
- I chose to use NLTK to extract the 6000 most frequent words from the Brown Corpus
- We can find much larger lexicons, but they tend to be too permissive
- LLM: Now that we know which pairs are valid words, we can plug these words into a sentence template, pass that sentence to an LLM to get a perplexity score, then rank the candidate words according their scores, and the correct solution should be among the sentences with the lowest perplexity.
- My sentence template:
- We went to the __1__ to see the __0__ compete in the tournament.
- I'm using GPT2 as implemented in the transformers python library.
This approach worked for me. I'll be back to share my solution after NPR's Thursday submission deadline. In the meantime, feel free to examine my script on GitHub and drop your suggestions in the comments. Good luck, Puzzlers!
Update & Solution
The deadline for submissions has passed, so click below if you want to see my answer. See you next week!
No comments:
Post a Comment