This week's challenge comes from listener Rawson Scheinberg, of Northville, Mich. Name a U.S. state capital. Then name a world capital. Say these names one after the over and phonetically you'll get an expensive dinner entrée. What is it?
As usual, let's begin with a breakdown of this problem.
We're concerned with 3 classes here: S: state capitals; this is a closed class of just 50 items, so we can easily obtain this list.
- W: world capitals; also a closed list of ~200 items, easy to obtain.
- X: expensive dinner entrees; this is an open list, so theoretically harder to obtain.
- However, I suspect if we simply came up with the top 10 or 20 possibilities, we'd have the solution covered.
We also need some phonetic resources:
- p: pronunciation dictionary function
- Note: This assumes we have an adequate list for X (expensive dinner entrees)
- We'll use p to obtain the phonetic spelling of each string in S, W, and X:
- p('albany') returns 'AO L B AH0 N IY0'
- p('berlin') returns 'B ER0 L IH1 N'
- p('steak') returns 'S T EY1 K'
- This is the ARPAbet phoneme representation used in CMUdict, the CMU Pronouncing Dictionary
- We've used CMUdict before by loading it directly as a text file and parsing it ourselves, but this time I'm experimenting with an existing python implementation. This includes a pretrained g2p (grapheme-to-phoneme) model; if a spelling isn't found in the dictionary, the model will predict the most likely way to pronounce that word.
- m: a matching function
- We need to be able to compare the entrée phonetic spelling with the combined state capital + world capital phonetic spelling
- m('S T EY1 K', 'AO L B AH0 N IY0 B ER0 L IH1 N') returns 'False'
- To avoid rejecting the correct solution, this will almost certainly need to allow some "fuzzy" matching
- We'll definitely want to drop the stress markers (0, 1, 2).
- We may even want to drop vowels, as vowel pronunciations can be highly context dependent, and when we combine the state capital and world capital strings, we're changing the context and the stress, so vowel changes are quite likely.
- Ideally, we'd have a tool that can take two words, find their phonetic spellings and then return a phonetic similarity score using an edit distance or Levenshtein distance.
Alternatively, I'd also like to try another approach, just for fun. What if we couldn't obtain a reasonable list for X, the expensive dinner entrees? How else could we approach the problem?
I think we'd have to iterate through the combinations of state and world capitals, generating pronunciations for each, most of which would not be real words. Then we'd need a p2g, or phoneme to grapheme model, to take each string of phonemes and map them to the most likely orthographic spelling of real English text. This resulting spelling would be treated as a candidate for the "expensive dinner entrée". From there, we'd query a language model to judge the likelihood of the candidate; in the past, we've used python implementations of BERT/SBERT and GPT2 for this evaluation. In this case, we might ask the model to score this sentence with each candidate inserted: "The [blank] is the most expensive entrée on the menu, but very highly reviewed by food critics."
Let's see how far we get with the first approach. I'll be back after the Thursday deadline for NPR submissions (so we don't spoil it for listeners) to share my solution. In the meantime, you can see the approach I'm tinkering with in this script. Good luck!
Levi King
Update & Solution
The deadline for submissions has passed, so click below if you want to see my answer. See you next week!
No comments:
Post a Comment