Skip to main content

LDS General Conference: The Software Guesses the Speaker

Mormon Tabernacle Choir general conference - A...Image via WikipediaLet's say that you stripped the author and photo from the online transcript of any given LDS General Conference talk. Let's also say that you didn't see or hear that talk delivered. Could you tell who gave the talk just by reading it?

My computer can. Here's how:

It turns out that modern text processing software is getting pretty good at this kind of stuff. You've always known that your inbox can tell whether or not a random email is spam with a fair amount of accuracy. Have you used cutting-edge software like Zemanta, though, that classifies and categorizes your blog post as you are typing it? Or how about companies that have software that preprocesses incoming customer feedback emails to decide whether you are happy with the product or not?

Even the most basic approaches can be very precise in some domains; this example being LDS General Conference talks.

The first thing that the computer needs is a set of training text. This text is like giving the computer the test questions and the answer key, which the computer can use to try to learn the subject matter. While "teaching to the test" might not be the best for our student population, it works great for computers that can make appropriate inferences from the smallest details.

The way I got my training text was to use a web crawler that would go to www.lds.org, download general conference talks, scrape off all the HTML, etc, and save the raw, unformatted text of each talk into a separate file. Each file's name had the name of the speaker in it, which would be where the software would look to check its answer.

The second thing that the computer needs is to know what features in the text are important to you. The most basic approach is to ask it to note individual words in the document. For example, one feature might be "the article has the word 'commandments' in it". Another might be "the article has the word 'scriptures' in it". There are a lot more ways to look at a document than just the words. How about "this article uses the passive tense" or "this article has long sentences" or "this article references '2 Corinthians'". This simply depends on your level of effort to teach the computer what to look for.

It's not quite as involved as that, though, at the most basic level. For the program that I wrote, it simply gets all the words in each document in the training set and treats each unique word as a feature. So, it builds the feature set automatically, and we get "this article has 'humble'" as well as "this article has 'seemed'" as features.

Third, the computer needs to know what it is trying to guess at. It needs a list of possible answers.

In our case, the answers are "Eyring," "Monson," "Uchtdorf," etc.

Fourth, the computer needs to search and tally those features in the training text.

The approach that I used is called Naive-Bayesian. The idea is simple. For each feature, give each document in the training text (for which it does have the answer key) a score in one of four categories:

1. This document has word X AND it is a talk by speaker Y
2. This document doesn't have word X AND it is a talk by speaker Y
3. This document has word X AND it is not a talk by speaker Y
4. This document doesn't have word X AND it is not a talk by speaker Y

Now, with all of that tallied, we can give it text that it hasn't seen before. Given enough data to work with, even this simple approach can be very accurate.

In fact, with my corpus of the last ten years of Eyring, Hinckley, and Monson talks, my computer is 88% accurate!

Here is some nifty data that it found:




















FeatureComparisonOdds
contains(prophets) = TrueEyring : Hinckl =15.1 : 1.0
contains(wants) = TrueEyring : Hinckl =12.4 : 1.0
contains(evidence) = TrueEyring : Monson =11.6 : 1.0
contains(promised) = TrueEyring : Hinckl =11.1 : 1.0
contains(seemed) = TrueEyring : Hinckl =9.8 : 1.0
contains(qualify) = TrueEyring : Hinckl =9.8 : 1.0
contains(commandments) = TrueEyring : Hinckl =9.8 : 1.0
contains(start) = TrueEyring : Monson = 9.1 : 1.0
contains(commandments) = FalseHinckl : Eyring =8.6 : 1.0
contains(answers) = TrueEyring : Hinckl =8.5 : 1.0
contains(gifts) = TrueEyring : Hinckl =8.5 : 1.0
contains(lifetime) = TrueEyring : Hinckl =8.5 : 1.0
contains(chose) = TrueEyring : Hinckl =8.5 : 1.0
contains(simple) = TrueEyring : Hinckl =8.3 : 1.0
contains(memory) = TrueEyring : Monson =7.9 : 1.0
contains(whatsoever) = TrueEyring : Monson =7.9 : 1.0
contains(resurrected) = TrueEyring : Monson =7.9 : 1.0


This table shows what the computer found to be the most helpful features, in our case 'words', in determining who gave the talk. In the first column is the feature. The second column is the two speakers, A:B, it is comparing and the third column is the odds of it being speaker A over speaker B, given that the feature is satisfied.

(Note to self: Do not show this to brother-in-law lest he decide that he can use these odds to place bets on the April general conference address.)

What do you see that's interesting to you? I think that it's really interesting to see "basic" words mixed in with religious terms. I think that it's also interesting to see most of the odds involve President Eyring. While I haven't taken it so far, yet, I would guess that this simply means that Eyring's vocabulary is more easily distinguished from the other two. That said, on my first go-round, I just used President Monson and President Hinckley's talks, and I was still at about 85% accuracy.



Enhanced by Zemanta

Comments

Popular posts from this blog

How Many Teeth Does The Tooth Fairy Pick Up Each Night in Utah?

Somebody asked me a question about my Tooth Fairy post the other day that got me thinking. How many baby teeth are lost every day in Utah?

I began with Googling. Surely someone else has thought of this and run some numbers, right? Lo, there is a tooth fairy site that claims that the Tooth Fairy collects 300,000 teeth per night.

That's a lot; however, when I ran the numbers, it started to seem awfully low.

Let's assume that the Tooth Fairy collects all baby teeth regardless of quality and we assume that all children lose all their baby teeth. The world population of children sits at 2.2 billion, with 74.2 million of them in the United States. Of those, approximately 896,961 of them are in Utah. This means that somewhere around .04077% of the world's children are in Utah.

If we assume that kids in Utah lose teeth at the same rate as all other children in the world and that each day in the year is just as likely as the rest to lose a tooth, then we have that of the alleged …

BYU and the Sunday Compromise?

I read an article by Brad Rock this morning where he quoted heavily from Dr. Thomas Forsthoefel who was giving his opinion on religious institutions being involved in sports. BYU, of course, came up.
I think Forsthoefel came off sounding a bit misinformed about the culture, drive, mission, etc. of BYU. Below is the email that I sent to Brad Rock this morning after finishing the article:
Brad -
That was an interesting article. I tend to disagree with Forsthoefel, though, or at least disagree with what I may have read into his comments.
A quote in your article says:
"There may be a kind of growing pain. BYU is in the real world and the real world works on Sunday. Can we (BYU) live with the adjustment? I'm empathetic with that, whatever decision is made, people are going to be unhappy.… Some will say get with the program, we'll be OK at the next level, others will say we've sold out and we've made a deal with the world."

This seems to suggest one or two things; fir…

Baby Names: What my daughter's name has to do with an ancient Persian Fairy Tales

If you read my previous post on my sons' names, you'll know that this post is about my daughters' names.

When we found out that we were going to have twins, I vowed that there names were not going to rhyme or alliterate. We weren't going to do Jadyn and Jordan, or Kim and Tim, or Esther and Edgar (all likely candidates for other, less elitist parents, especially Esther and Edgar). I did want the names to have something to do with one another somehow.




Felicity Mae CummingsFelicity's first name has little to do with its underlying Hebrew meaning or its tie to Biblical history and everything to do with the fact that this was a name that Kristi had always wanted one of her girls to have because she liked that it meant "happiness".

So, to tell you the truth, I didn't do a lot of research on this name because its place in our family had already been decided.

But, it was excellent material to work with. The initial spark that 'Felicity' provided gave …