Saturday, March 06, 2021

Two Companies (Solution - UNSOLVED!)

 Welp. I'm finally stumped! Did you solve it?

Let's take another look at the Sunday Puzzle:

This week's challenge from from Joseph Young of St. Cloud, Minn. I'm looking for the names of two companies. One of them has a two-part name (5,5). The other has a three-part name (5,7,5). The last five-letter part of the two names is the same. And the first five-letter part of the first company's name is something the second company wants. What is it?

In the preview post, I shared this breakdown:



    • Lists:
      • C: list of companies;
        • We can try scraping text from the web then running a named entity recognizer (NER) tool to extract all the corporation names;
        • More likely, we can find a list of company names already on the web;
        • Stock market listings might be helpful;

    • Functions:
      • filter_names(C,[word lengths]): This function should take our list C and a list of integers representing the lengths of each word in the company name; it should return only company names from C that matches the word lengths.
        • For example, to find companies matching the second company's word lengths, we'd use the function like this: filter_names(C,[5,7,5])
      • proposition_probability(some_sentence): This function should generate a score between 0 and 1 to represent the likelihood of a sentence. 
        • We'll use this to solve the last part of the puzzle. We know that Company A is two words, (5,5). Let's call these caxcay. Let's call the (5,7,5) company Company B, or cb. So if we have a rich language model and/or a knowledge graph, we can take our lists of companies with matching word lengths and the matching final 5-letter word, and iterate through to find the most likely propositions.
        • For example, we'll take each cb candidate, then iterate through each ca candidate; we'll use a sentence template to construct propositions that we can evaluate with a model. An example might be: "[cb] is looking to acquire more [cax]". (Most likely, we'd want to use a few variations on this and take the average.) So we'll use this function to evaluate all our candidates; we can rank them by probability, then manually skim through the most probable propositions to find the solution.
        • What resource can we use here? I've been using BERT for various language modeling tasks lately, so I plan to do a little reading and see if it seems suitable. I'm not sure if it really captures the kind of knowledge we need involving specific companies.



      So that's pretty much what I tried so far. I cobbled together a huge list of companies -- 28,605 to be exact. From these, I do a lot of pre-processing to expand the list with alternate versions of each name. For example, I look for a lot of generic words like "corp" and "inc" and drop those. Then I just find all the Company A (5,5) candidates, and all the Company B (5,7,5) candidates. Then we find all pairs where Company A word2 matches Company B word2. Then I use some templates to create sentences and evaluate these with BERT, which returns a list sorted by the score.

      I didn't really find anything that makes sense. Also, most of the companies that fit 5,5 or 5,7,5 are companies I've never heard of, and I suspect the companies in the solution will sound familiar. A couple of famous names pop up as Company A candidates:

      • Shake Shack
      • Exxon Mobil
      • Jamba Juice
      But Exxon and Jamba don't really make sense as something Company B wants, and I can't find a 5,7,shack candidate to match Shake Shack.

      When I have the solution, I'll revise my script to get it working; I'm mostly curious to learn whether or not I have the company names in my list, and if so, what is wrong with my string handling that results in not finding the solution?

      No comments:

      Post a Comment

      Director, anagram, film award

      Welcome back to Natural Language Puzzling, the blog where we use natural language processing and linguistics to solve the Sunday Puzzle from...