This week's challenge comes from listener Steve Baggish of Arlington, Massachusetts. Think of a nine-letter word naming a kind of tool that is mentioned in the Bible. Remove the second and sixth letters and the remaining letters can be rearranged to spell two new words that are included in a well known biblical passage and are related to the area in which the tool is used. What are the three words?
Curious! Let's break this down a little:
- a nine-letter tool from the Bible
- seems like a very specific thing, like a word we find in the Bible but rarely elsewhere
- It may surprise some people to know that we linguists tend to know a lot about the Bible. A lot of early linguistics was motivated by religious groups' desires to translate the Bible into as many languages as possible to reach potential converts. As natural language processing (NLP) developed, the Bible proved to be a good resource for building translation models, because the clear numbering of book and verse means it's easy to align the sentences across languages and use statistical methods to determine the which words and morphemes correspond. In these days of web-scale data, the Bible would be considered a tiny dataset for testing or training, but it served as a starting point for a lot of tasks in the history of NLP. And that brings me to some relevant stats here.
- word tokens (total word count) in the Bible: 930,243
- word types (unique words) in the Bible: 14,564
- ~15,000 word types is really not that many to sort through. We only want those that are 9 letters or more, and that's likely to eliminate 80% or more. If even 2,000 words are left, that's very manageable for a variety of approaches we might take here.
- We could even use a simple language model to get perplexity scores for each 9-letter word to see if it makes sense in the context of a tool. For example, how well would it fit in this sentence?
- 'The _______ improved worker productivity in ancient times.'
- remove the second and sixth letters; rearrange remaining letters to spell two new words
- Easy-peasy. This is a straightforward task and we can write a python function to do this and generate all the possible character sequences broken into two words.
- two new words that are included in a well known biblical passage and are related to the area in which the tool is used
- This part is a bit tricky and doesn't give us much to go on semantically. I'm thinking we could use our python function to generate all the candidates, then:
- First, ensure that the two words are indeed English words and not gibberish
- Next, check whether there exists some verse in the Bible that contains both words.
- Finally, I think we'll need to mentally confirm that the passage somehow relates to the area in which the tool is used.
April 12, 2024 Update
The Thursday deadline has passed, so here's my solution:
See you next week!
No comments:
Post a Comment