Let's take a crack at this week's NPR Sunday Puzzle using natural language processing.
This week's challenge: It comes to us from listener Andrew Chaikin of San Francisco, also known as the singer Kid Beyond. Think of a famous character in American literature. Change each letter in that character's name to its position in the alphabet — A=1, B=2, etc. — to get a famous year in American history. Who is this person and what is the year?
Interesting! This one is a little different from the usual format of take a thing from class A, apply the given transformation, get a thing from class B. Well, it's kind of similar but we do this transformation from letters to numbers.
What do we need to solve this puzzle?
- C: a list of famous characters from American literature
- open class
- We can search the web for an existing list
- We can ask a chatbot for a list
- We can think up our own
- We can scrape one from Wikipedia
- Y: a list of "famous years in American history"
- semi-open class
- what counts as a "famous year"?
- presumably within range of 1492-2024
- Again, we can search the web, ask a chatbot, think up our own
- Presumably this is things like 1776, 1863, 1929, 2001...
- p: a function to turn a letter into its position in the alphabet
Let's think about this for a minute. Y seems to be very restrictive. It can only begin with 1 or 2. In fact, we can limit it to even these first two digits: 14, 15, 16 ,17, 18, 19, 20.
This, in turn, limits the possible starting letters for the character's name. Take caution, however, because there will be multiple ways to tokenize a 4-digit year. For example, let's take 1863 (the year of the Emancipation Proclamation). We can tokenize the digits into any combination that yields numbers in the range of 1-26. So for 1863:
- 1, 8, 6, 3
- 18, 6, 3
Note that "1, 86, 3" isn't valid because there is no 86th position in the alphabet.
What other conclusions can we draw from Y?
Clearly the only character names in C that can match with a year inY will have a length of 2, 3 or 4 letters. A 4 letter name will only work if each of the 4 letters is in the first 9 letters of the alphabet (because each letter must translate to a single digit number). And 3 letter names could have only 1 letter occurring after letter 9. So the takeaway is this, the name will likely contain mostly letters that occur within the first 9 letters of the alphabet: abcdefghi. Also, I think we can assume that the name must be recognizable on its own; Jad or Deb aren't particularly meaningful names in the context of characters from American literature. We probably need a more iconic name like Huck (but no, Huck doesn't work --> 8,21,3,11).
Given how restrictive this puzzle is, we could probably come up with the answer fairly easily by just thinking through it and maybe searching the web a bit for inspiration regarding character names and significant years. But, this isn't that kind of blog! You can see my approach using python in this script. I'll be back after the Thursday submission deadline to share my solution.
Update: Now that the deadline for submissions has passed, I'll share my solution here:
Did you get it too?