Monday, April 19, 2021

Actor, bird, mammal (preview)

I'm back after a week off, Puzzlers. Let's get into the latest Sunday Puzzle:

This week's challenge comes from listener Theodore Regan, of Scituate, Mass. Name a famous actor — 4 letters in the first name, 7 letters in the last. You can change the first letter of the actor's first name to name a bird. And you can change the first letter of the actor's last name to name a mammal. Who's the actor?

Let's break this down. Here's what we need:

  1. A: A list of actors (4+7 letters);
  2. B: A list of birds (4 letters);
  3. M: A list of mammals (7 letters);
  4. f_swap: A function that takes a word/name and replaces the first letter with every other letter in the alphabet.
This seems pretty straightforward. I only see one potential pitfall: "actors" often includes men and women these days. I suspect I can find the three lists we need without having to scrape them together myself.

Good luck! See you later this week with my solution (I hope!).

Thursday, April 01, 2021

Two Bird Words, One Function (Solution)

Welcome back, Puzzle Gang! I think I put together a pretty cool process to solve this one. Once again, here's the puzzle:

This week's challenge comes from listener Greg VanMechelen, of Berkeley, Calif. Name something birds do. Put the last sound of this word at the start and the first sound at the end, and phonetically you'll name something else birds do. What are these things?

As discussed in the puzzle preview post, we need:
  1. B, a list of "something birds do" words;
  2. P, a pronunciation dictionary;
  3. f_swap, a function to swap the initial and final phonemes;
For B, I ended up using word2vec (specifically the Gensim implementation). I started with a small list of seed words:

['flying', 'flapping', 'nesting', 'brooding', 'pecking', 'roosting', 'tweeting', 'cawing', 'crowing']

For each seed word, I took the top 100 word2vec suggestions. These suggestions are words that appear in similar contexts to the seed word. I chose "-ing" progressive verb forms in hopes that would return more verbs; "nesting" vs "nest" should do that, right? The suggestions from word2vec take many word forms, so I then run everything through a lemmatizer to get the base form of each word. So we now have list B, with a few hundred candidate "bird words." Let's set that aside for now.

For a pronunciation dictionary, I used the Carnegie Mellon University Pronouncing Dictionary. It contains over 134,000 words and their pronunciations (nice!) and for phoneme symbols, it uses ARPAbet (yuck!). It's available in a plain text file; here's an excerpt:

missourians M AH0 Z UH1 R IY0 AH0 N Z 
misspeak M IH0 S S P IY1 K
misspeak(2) M IH0 S P IY1 K
misspell M IH0 S S P EH1 L
misspell(2) M IH0 S P EH1 L
misspelled M IH0 S S P EH1 L D

We don't need to worry too much about the details of ARPAbet or the CMU dictionary, but here are some relevant points. All vowels (i.e., the syllable nucleus) have a stress marker of 0, 1, or 2. So if we split each line on whitespace, we can take the first piece as the word (may need to remove "(2)" or presumably "(3)", etc.) and the remaining pieces as the pronunciation. Taking the pronunciation, we can just keep removing characters from the left side until we hit a vowel (containing 0, 1 or 2) to get the onset of the first syllable. We do the same from the right side of the pronunciation to get the coda of the final syllable. Then we swap the onset and coda and recombine with the rest of the pronunciation. That's our  f_swap function.

I put my solver script on Github, and it contains plenty of notes so you can follow along. (Note that if you want to try my script, you'll need to download word2vec, Stanza and the CMU dictionary.) The whole ting works like this:

  1. Query word2vec with seed words to get B (bird words);
  2. lemmatize B;
  3. For each word w in B, query w's pronunciations from the CMU dictionary;
    1. For each pronunciation (let's call each p_w), swap onset and coda, call this p_ws (for "pron_w_swapped");
      1. For each other (not equal to w) word (let's call this word x) in B (yes, we're iterating through B again) query the pronuciations;
        1. For each x pronunciation (p_x):
          1. if p_x equals p_ws, we have a solution (w & x).
And what do you know---I had to try a couple of different word2vec models and experiment with the number of suggestions I want for each seed word, but I got the solution! Here were the two solutions my system produced---one valid, one not:

  • fussed stuff
  • perch chirp

Director, anagram, film award

Welcome back to Natural Language Puzzling, the blog where we use natural language processing and linguistics to solve the Sunday Puzzle from...