Monday, January 11, 2021

Names in the News (Preview)

As we prepare to crack our third puzzle, I want to reiterate what this blog is about and welcome any newcomers.

Every Sunday on National Public Radio (NPR), the program Weekend Edition Sunday airs a segment called Sunday Puzzle. It's been a recurring segment since the 1980s, I believe. Will Shortz, the editor of the New York Times crossword puzzle and "NPR Puzzlemaster" hosts the segment. Each week, there is a "listener challenge" puzzle, and listeners have until Thursday to submit the answer. A winner is drawn from the correct answer, and that person gets to call in and play an on-air challenge with Will Shortz (and Weekend Edition Sunday host Lulu Garcia-Navarro) during the subsequent Sunday Puzzle.

The puzzle is typically some kind of word puzzle. In this blog, I focus on applying natural language processing (NLP) to find the solution, hence "Natural Language Puzzling." I've listened to the puzzle for years, often brainstorming what NLP methods or tools I would use. Now I've committed to working out these puzzles on this blog. I keep the writing fairly non-technical, so I'm hoping it will be accessible to a broad range of puzzlers. I also upload any scripts I put together to solve a puzzle so those who want a closer look can download them. I post a preview on Sunday or Monday, discussing the new puzzle and possible approaches. Then, I post my approach and the solution on Friday (after the Thursday deadline so as not to spoil it).

This week's puzzle is another of the f_x(string1) = string2 variety. That is, we're looking for string1 and string2, where each is a word or phrase; we're given function x, and when we apply function x to string1 the result is string2. Our function x is, as always, some kind of transformation, like "remove two letters"  or "swap the first and last letters."

Here's this week's puzzle:

This week's challenge comes from listener Michael Shteyman, of Freeland, Md. Name a person in 2011 world news in eight letters. Remove the third, fourth and fifth letters. The remaining letters, in order, will name a person in 2021 world news. What names are these?

I confess that I solved this one already, sans NLP. This was one of those rare times when the answer popped right out at me. Nonetheless, I want to see if I can put together a script using NLP approaches that will spit out the correct solution.

So what do we need to solve this one? We know we need two "persons" from world news. The wording here seems deliberate; we don't know if these are full names, last names, maybe even nicknames. Crucially, we don't know if we should expect spaces within the strings.

Like the previous two problems, we'll want to start with two lists of candidate words (names), then iterate through list 1 to see if the transformation function produces a word from list 2.

Where can we get these candidate word lists?

  1. Search the web: Maybe we can find lists that others have posted for "names in the news" for  2011 and 2021
  2. News text: For example, if we had a corpus of Wall Street Journal articles from 2011, we could extract a list of names using a Named Entity Recognition (NER) tool, like the one included in the Stanford Core NLP package. It might be hard to come by such datasets for free, however.
  3. Twitter: I suspect we might be able to find datasets that contain a large random sample of Tweets, possibly divided into months or years in a way we can use. The NER approach would work there too. It might be less accurate, because news text follows strict style guidelines and is thus relatively more predictable than Twitter text, which follows no guidelines.
  4. Wikipedia: I saved my preferred approach for last. Preferred because it should be the fastest and easiest. Wikipedia has a page for each year, listing major events from that year. I think there are also more specific year-in-review type articles, too, that focus on entertainment news, politics news, sports news, etc. It would be fairly straightforward to collect the text from these articles and pipe it through an NER tool to get our candidate lists.
With candidate lists in hand, we need to script the transformation function and iterate.

That's my plan. I'll be back on Friday to walk you through it.

--Levi King

No comments:

Post a Comment

Director, anagram, film award

Welcome back to Natural Language Puzzling, the blog where we use natural language processing and linguistics to solve the Sunday Puzzle from...