Wednesday, October 02, 2024

Breakfast cereal characters

Welcome back to Natural Language Puzzling, the only blog on the web that teaches you how to solve complex word puzzles using natural language processing (NLP), computational linguistics, corpus linguistics, language modeling and computer programming.

Let's take a shot at this week's Sunday Puzzle from NPR:

This week's challenge comes from listener Curtis Guy, of Buffalo, N.Y. Name a certain breakfast cereal character. Remove the third, fifth, and sixth letters and read the result backward. You'll get a word that describes this breakfast cereal character. What is it?

This seems like an easy one, right? You can probably come up with the solution in your head within a few minutes, like I did. But let's imagine we need to solve this programmatically. What do we need to do so?

  • C: a list of breakfast cereal characters
    • This is going to be a pretty short list, which makes this an easy puzzle
    • We can brainstorm one and/or do a little web searching
  • transform(c): a function to transform each breakfast cereal character (c) as prescribed in the puzzle (remove letters and reverse the string)
    • We'll use python to perform this
  • evaluate(w): a function to evaluate the likelihood of each transformed string as "a word that describes this breakfast cereal character"
    • lexicon: given the restrictive nature of this puzzle (i.e., there won't be a lot of candidates to consider), we can probably simply rely on a lexicon--any transformed string that isn't found in the English lexicon is rejected, so we only need to manually read through the remaining candidates to find the solution.
    • language model: As we've done in the past, we can load an LLM, slot the character and the transformed string into a sentence frame, and score the candidates according to the perplexity score provided by our model.
    • I'm going to use both a lexicon and an LLM. First, if the transformed string isn't in the lexicon, it's immediately rejected. Any transformed strings that are in the lexicon are then passed to the LLM as described above. I'll use the GPT2 model in the python transformers library.
If this approach works, we'll get a list of sentences ranked by perplexity, and we'll expect the correct solution to be among the sentences with the lowest perplexity. Good luck! I'll be back after the Thursday NPR deadline to share my solution. In the meantime, you can see my python script here. If you need help with your list of cereal characters, you can find mine hardcoded in that script.

Update & Solution

The deadline for submissions has passed, so click below if you want to see my answer. See you next week!



No comments:

Post a Comment

Director, anagram, film award

Welcome back to Natural Language Puzzling, the blog where we use natural language processing and linguistics to solve the Sunday Puzzle from...