Skip to main content

LDS General Conference: The Software Guesses the Speaker

Mormon Tabernacle Choir general conference - A...Image via WikipediaLet's say that you stripped the author and photo from the online transcript of any given LDS General Conference talk. Let's also say that you didn't see or hear that talk delivered. Could you tell who gave the talk just by reading it?

My computer can. Here's how:

It turns out that modern text processing software is getting pretty good at this kind of stuff. You've always known that your inbox can tell whether or not a random email is spam with a fair amount of accuracy. Have you used cutting-edge software like Zemanta, though, that classifies and categorizes your blog post as you are typing it? Or how about companies that have software that preprocesses incoming customer feedback emails to decide whether you are happy with the product or not?

Even the most basic approaches can be very precise in some domains; this example being LDS General Conference talks.

The first thing that the computer needs is a set of training text. This text is like giving the computer the test questions and the answer key, which the computer can use to try to learn the subject matter. While "teaching to the test" might not be the best for our student population, it works great for computers that can make appropriate inferences from the smallest details.

The way I got my training text was to use a web crawler that would go to www.lds.org, download general conference talks, scrape off all the HTML, etc, and save the raw, unformatted text of each talk into a separate file. Each file's name had the name of the speaker in it, which would be where the software would look to check its answer.

The second thing that the computer needs is to know what features in the text are important to you. The most basic approach is to ask it to note individual words in the document. For example, one feature might be "the article has the word 'commandments' in it". Another might be "the article has the word 'scriptures' in it". There are a lot more ways to look at a document than just the words. How about "this article uses the passive tense" or "this article has long sentences" or "this article references '2 Corinthians'". This simply depends on your level of effort to teach the computer what to look for.

It's not quite as involved as that, though, at the most basic level. For the program that I wrote, it simply gets all the words in each document in the training set and treats each unique word as a feature. So, it builds the feature set automatically, and we get "this article has 'humble'" as well as "this article has 'seemed'" as features.

Third, the computer needs to know what it is trying to guess at. It needs a list of possible answers.

In our case, the answers are "Eyring," "Monson," "Uchtdorf," etc.

Fourth, the computer needs to search and tally those features in the training text.

The approach that I used is called Naive-Bayesian. The idea is simple. For each feature, give each document in the training text (for which it does have the answer key) a score in one of four categories:

1. This document has word X AND it is a talk by speaker Y
2. This document doesn't have word X AND it is a talk by speaker Y
3. This document has word X AND it is not a talk by speaker Y
4. This document doesn't have word X AND it is not a talk by speaker Y

Now, with all of that tallied, we can give it text that it hasn't seen before. Given enough data to work with, even this simple approach can be very accurate.

In fact, with my corpus of the last ten years of Eyring, Hinckley, and Monson talks, my computer is 88% accurate!

Here is some nifty data that it found:




















FeatureComparisonOdds
contains(prophets) = TrueEyring : Hinckl =15.1 : 1.0
contains(wants) = TrueEyring : Hinckl =12.4 : 1.0
contains(evidence) = TrueEyring : Monson =11.6 : 1.0
contains(promised) = TrueEyring : Hinckl =11.1 : 1.0
contains(seemed) = TrueEyring : Hinckl =9.8 : 1.0
contains(qualify) = TrueEyring : Hinckl =9.8 : 1.0
contains(commandments) = TrueEyring : Hinckl =9.8 : 1.0
contains(start) = TrueEyring : Monson = 9.1 : 1.0
contains(commandments) = FalseHinckl : Eyring =8.6 : 1.0
contains(answers) = TrueEyring : Hinckl =8.5 : 1.0
contains(gifts) = TrueEyring : Hinckl =8.5 : 1.0
contains(lifetime) = TrueEyring : Hinckl =8.5 : 1.0
contains(chose) = TrueEyring : Hinckl =8.5 : 1.0
contains(simple) = TrueEyring : Hinckl =8.3 : 1.0
contains(memory) = TrueEyring : Monson =7.9 : 1.0
contains(whatsoever) = TrueEyring : Monson =7.9 : 1.0
contains(resurrected) = TrueEyring : Monson =7.9 : 1.0


This table shows what the computer found to be the most helpful features, in our case 'words', in determining who gave the talk. In the first column is the feature. The second column is the two speakers, A:B, it is comparing and the third column is the odds of it being speaker A over speaker B, given that the feature is satisfied.

(Note to self: Do not show this to brother-in-law lest he decide that he can use these odds to place bets on the April general conference address.)

What do you see that's interesting to you? I think that it's really interesting to see "basic" words mixed in with religious terms. I think that it's also interesting to see most of the odds involve President Eyring. While I haven't taken it so far, yet, I would guess that this simply means that Eyring's vocabulary is more easily distinguished from the other two. That said, on my first go-round, I just used President Monson and President Hinckley's talks, and I was still at about 85% accuracy.



Enhanced by Zemanta

Comments

Popular posts from this blog

How Many Teeth Does The Tooth Fairy Pick Up Each Night in Utah?

Somebody asked me a question about my Tooth Fairy post the other day that got me thinking. How many baby teeth are lost every day in Utah?

I began with Googling. Surely someone else has thought of this and run some numbers, right? Lo, there is a tooth fairy site that claims that the Tooth Fairy collects 300,000 teeth per night.

That's a lot; however, when I ran the numbers, it started to seem awfully low.

Let's assume that the Tooth Fairy collects all baby teeth regardless of quality and we assume that all children lose all their baby teeth. The world population of children sits at 2.2 billion, with 74.2 million of them in the United States. Of those, approximately 896,961 of them are in Utah. This means that somewhere around .04077% of the world's children are in Utah.

If we assume that kids in Utah lose teeth at the same rate as all other children in the world and that each day in the year is just as likely as the rest to lose a tooth, then we have that of the alleged …

Mental Math Tricks: Is this divisible by 17?

Image via WikipediaSo, most know how to tell if something is divisible by 2 or 5, and many know how to tell if something is divisible by 9. What about other numbers?

So, here are strategies for discovering divisibility from 2 to 10, and then we'll talk about some rarer, more surprising divisibility tricks:

Divisible by two:If the number ends in 0, 2, 4, 6, or 8, it is divisible by 2.

Divisible by five:If the number ends in 0 or 5, it is divisible by 5.

Divisible by ten:If the number ends in 0, it is divisible by 10.

Divisible by nine: If you add all the digits in a number together and that new number is divisible by 9, then it is also divisible by 9.

Example #1: 189 -> 1 + 8 + 9 = 18, 18 is divisible by 9, so 189 is also divisible by 9
Example #2: 137781 -> 1 + 3 + 7 + 7 + 8 + 1 = 27, 27 is divisible by 9, so 137781 is also divisible by 9
(Note: If adding six numbers together in your head seems difficult, look for my next post on number-grouping tricks. Soon, adding six numbe…

BYU and the Sunday Compromise?

I read an article by Brad Rock this morning where he quoted heavily from Dr. Thomas Forsthoefel who was giving his opinion on religious institutions being involved in sports. BYU, of course, came up.
I think Forsthoefel came off sounding a bit misinformed about the culture, drive, mission, etc. of BYU. Below is the email that I sent to Brad Rock this morning after finishing the article:
Brad -
That was an interesting article. I tend to disagree with Forsthoefel, though, or at least disagree with what I may have read into his comments.
A quote in your article says:
"There may be a kind of growing pain. BYU is in the real world and the real world works on Sunday. Can we (BYU) live with the adjustment? I'm empathetic with that, whatever decision is made, people are going to be unhappy.… Some will say get with the program, we'll be OK at the next level, others will say we've sold out and we've made a deal with the world."

This seems to suggest one or two things; fir…