It's Monday, Puzzlers, and you know what that means! Welcome to Natural Language Puzzling, the blog where we use natural language processing (NLP), linguistics, data science, language modeling, programming and our own noggins to solve the weekly Sunday Puzzle from NPR. Here's this week's puzzle:
This week's challenge comes from listener Dan Asimov, of Berkeley, Calif. In English the two-letter combination TH can be pronounced in two different ways: once as in the word "booth," the other as in "smooth." What is the only common English word, other than "smooth," that ends in the letters TH as pronounced in smooth?
Well, this is certainly a diversion from the usual Sunday Puzzle format, which involves taking one string of text, applying a prescribed transformation and generating another string of text.
So what do we need to solve this puzzle?
Really only one thing: a robust pronunciation dictionary. For that, we can turn to an old friend-- the Carnegie-Mellon University (CMU) Pronouncing Dictionary. You can read more about that on the project website, or simply download the file from GitHub.
Now that we have the dictionary, we can navigate to it from the command line. From that location, we can query the pronunciations for "booth" and "smooth".
This shows us how the phonemes in question are represented in the CMU Pronouncing Dictionary, which uses the ARPAbet symbols. We can see that "smooth" ends with the symbol "DH", so our target word (the only common English word, other than "smooth," that ends in the letters TH as pronounced in smooth) must also end with the symbol "DH".
The next step is simply to query the dictionary file for any pronunciations that end in "DH". For reference, here's what the body of the file looks like:
carousel K EH1 R AH0 S EH2 L
carousing K ER0 AW1 Z IH0 NG
carow K AE1 R OW0
carozza K ER0 AA1 Z AH0
carp K AA1 R P
carpal K AA1 R P AH0 L
We can use grep plus the "$" anchor to ensure that we find "DH" at the end of a line:
I truncated the output so as not to spoil the solution. As we can see, the results shown all end in "-the", whereas the target word should end in "-th". In fact, there's only one word (other than "smooth") in the output of the grep query that fits the bill, so that must be the solution. I'll be back after NPR's Thursday deadline for submissions to share that solution. Good luck, Puzzlers!
The deadline for NPR submissions has passed, so click the spoiler button below to see my solution.
The CMU Pronouncing Dictionary gives these four pronunciations for "with". Two of them fit the bill:
No comments:
Post a Comment