It's Monday Funday on Natural Language Puzzling, the puzzle blog where use natural language processing, corpus linguistics, computational linguistics, language modeling and other tools of the trade to take on the weekly Sunday Puzzle from Weekend Edition on NPR.
This week's challenge: Take the phrase NEW TOWELS. Rearrange its nine letters to get the brand name of a product that you might buy at a supermarket.
What approach would you take here?
Here's how I will set this problem up:
- C: a list of candidate anagrams (strings)
- We shouldn't assume that the solution will be a single word. We'll need to allow for multiword solutions, so long as they include the same 9 tokens as NEW TOWELS.
- We can use our own approach to generating anagrams, as we did in this puzzle from two weeks ago (script here);
- I'm using a more efficient implementation that I found in this script on Github, credit to Raghav Gurung;
- Ideally we'll set this up recursively, so that for each word we find that can be spelled with some subset of NEWTOWELS, we then look for additional spellable words within the remaining letters of NEWTOWELS.
- In fact, I'll likely brute force it by simply rerunning the function 3 or 4 times on the remaining letters until none remain or no spellable words exist. If this works, we can approach this as an exercise in recursive programming and come back to reimplement--sounds like fun!
- B: a list of product brand names you might find at a supermarket
- We might be able to find a decent list already on the web somewhere;
- We could probably get something workable from an LLM if we run enough queries;
- This is an open class, so we'll never have a complete list
- LM: a language model (LM) that we can use to evaluate the probability of a sentence.
- In fact, I will be using the GPT2 implementation in the Python Transformers library, as this is probably adequate and we can run it easily on a laptop.
- I'll plug each candidate into a sentence template like this:
- "I prefer to buy name brands like [BLANK] that can be found in supermarkets everywhere."
- Then the sentence is passed to the LM, which returns a perplexity score
- I'll sort the sentences by this score; the solution should be the one with the lowest or near lowest perplexity.
That's the approach I've taken in my script, and I'll be back after NPR's Thursday deadline for submissions to share my solution. Good luck!
Update & Solution
The deadline for submissions has passed, so click below if you want to see my answer. See you next week!
No comments:
Post a Comment