Monday, March 03, 2025

Singer/actress, dangerous object

Greetings, Puzzlers, and welcome back to Natural Language Puzzling, the blog where we use natural language processing, linguistics and data science to solve the Sunday Puzzle from NPR!

Here's this week's puzzle:

This week's challenge comes from listener Dennis Burnside, of Lincoln, Neb. Think of a famous singer and actress, first and last names, two syllables each. The second syllable of the last name followed by the first syllable of the first name spell something that can be dangerous to run into. What is it?

Let's break this down and determine how we'll approach solving the puzzle.

This is what I call a "Classic format" Sunday Puzzle: Take thing from Class A (singer/actress), apply transformation, yield thing from Class B (something dangerous to run into).

So here's what we'll need to solve it:
  • A: list of singer/actresses
    • This is an open class, so we'll need robust coverage
    • We could try finding an existing list online, or cobbling one together from various lists and listicles
    • We could ask a chatbot LLM like ChatGPT/Gemini/etc. to give us a list
  • B: list of things that are dangerous to run into
    • Also open class
    • A little vague. Is it inherently dangerous, or only dangerous if you run into it?
      • e.g., knife vs tree
    • Is there wordplay here? (probably not)
      • e.g., ex-spouse
    • We don't actually need a list here a priori. We can iterate through list A and evaluate the resulting candidates for "something dangerous to run into".
      • We simply need an LLM that can provide us with perplexity scores for sentences
      • We use a sentence template:
        • "Stay alert because running into [BLANK] can be very dangerous."
      • We plug the string derived from combining the two syllables from the singer/actress into the blank, and get a score representing how (im)probable the resulting sentence is (i.e., its perplexity).
      • Ideally, the correct solution will be among those ranked as most probable here.
  • Functions:
    • Syllabification:
      • We need the ability to divide each string into syllables. This is for counting and for forming the "something dangerous" string.
      • This seems simple but it can be messy...
      • We're dealing with names, which can have highly variable orthography and phonology, and pronunciation dictionaries are certain to be missing lots of less frequent names.
      • It's not always easy to determine syllables based on orthography. A sequence of vowels could be a single syllable (a diphthong as in train) or multiple syllables (a hiatus as in trivia)
      • I'm going to rely on the CMU Pronouncing Dictionary, which we can call from within NLTK in Python. Because each syllable is marked with 0, 1 or 2 for stress, we can simply count the number of numeral digits in the pronunciation. For example, the pronunciation for trivia is given as ['T', 'R', 'IH1', 'V', 'IY0', 'AH0'].
      • For strings that are out of vocabulary (OOV), I'll back off to a rule-based approach that tries to count syllables. I plan to use the syllapy library, as described here.
    • String transformation:
      • We also need a simple function to recombine the syllables as described: second syllable of the last name followed by the first syllable of the first name.
      • We'll handle this in python as well.
    • Perplexity scoring:
      • As mentioned above, we'll need an LLM that can give us perplexity scores for sentences. I'm going to use a huggingface transformers implementation of a GPT model. (But stay tuned--I'm building implementations of several other LLMs and we'll be evaluating those in future puzzling!)
That more or less covers my approach. Would you handle this differently? Do you think an LLM could solve this puzzle outright? (So far, the ones I've tried have not managed a solution).

Please leave your comments and insights (but no spoilers please). I'll be back after NPR's Thursday deadline for submissions to share my solution and my implementation. Good luck!

Update & Solution

The deadline for submissions has passed, so click below if you want to see my answer. See you next week!

No comments:

Post a Comment

Director, anagram, film award

Welcome back to Natural Language Puzzling, the blog where we use natural language processing and linguistics to solve the Sunday Puzzle from...