Natural Language Puzzling: Two Companies (Preview)

I hope all my Puzzlers are having a great week! Let's take a look at the latest Sunday Puzzle from NPR:

This week's challenge from from Joseph Young of St. Cloud, Minn. I'm looking for the names of two companies. One of them has a two-part name (5,5). The other has a three-part name (5,7,5). The last five-letter part of the two names is the same. And the first five-letter part of the first company's name is something the second company wants. What is it?

OK. This looks tricky, but let's break it down.

What resources do we need?

Lists:

C: list of companies;

We can try scraping text from the web then running a named entity recognizer (NER) tool to extract all the corporation names;
More likely, we can find a list of company names already on the web;
Stock market listings might be helpful;

Functions:

filter_names(C,[word lengths]): This function should take our list C and a list of integers representing the lengths of each word in the company name; it should return only company names from C that matches the word lengths.

For example, to find companies matching the second company's word lengths, we'd use the function like this: filter_names(C,[5,7,5])

proposition_probability(some_sentence): This function should generate a score between 0 and 1 to represent the likelihood of a sentence.

We'll use this to solve the last part of the puzzle. We know that Company A is two words, (5,5). Let's call these cax, cay. Let's call the (5,7,5) company Company B, or cb. So if we have a rich language model and/or a knowledge graph, we can take our lists of companies with matching word lengths and the matching final 5-letter word, and iterate through to find the most likely propositions.
For example, we'll take each cb candidate, then iterate through each ca candidate; we'll use a sentence template to construct propositions that we can evaluate with a model. An example might be: "[cb] is looking to acquire more [cax]". (Most likely, we'd want to use a few variations on this and take the average.) So we'll use this function to evaluate all our candidates; we can rank them by probability, then manually skim through the most probable propositions to find the solution.
What resource can we use here? I've been using BERT for various language modeling tasks lately, so I plan to do a little reading and see if it seems suitable. I'm not sure if it really captures the kind of knowledge we need involving specific companies.

That's my plan. Do you have ideas or suggestions?

Thanks for stopping by. I'll see you later this week with a solution (I hope)!

--Levi

Natural Language Puzzling

Tuesday, March 02, 2021

Two Companies (Preview)

No comments:

Post a Comment

Director, anagram, film award

Report Abuse