Skip to main content

LDS General Conference: The Software Guesses the Speaker

Mormon Tabernacle Choir general conference - A...Image via WikipediaLet's say that you stripped the author and photo from the online transcript of any given LDS General Conference talk. Let's also say that you didn't see or hear that talk delivered. Could you tell who gave the talk just by reading it?

My computer can. Here's how:

It turns out that modern text processing software is getting pretty good at this kind of stuff. You've always known that your inbox can tell whether or not a random email is spam with a fair amount of accuracy. Have you used cutting-edge software like Zemanta, though, that classifies and categorizes your blog post as you are typing it? Or how about companies that have software that preprocesses incoming customer feedback emails to decide whether you are happy with the product or not?

Even the most basic approaches can be very precise in some domains; this example being LDS General Conference talks.

The first thing that the computer needs is a set of training text. This text is like giving the computer the test questions and the answer key, which the computer can use to try to learn the subject matter. While "teaching to the test" might not be the best for our student population, it works great for computers that can make appropriate inferences from the smallest details.

The way I got my training text was to use a web crawler that would go to www.lds.org, download general conference talks, scrape off all the HTML, etc, and save the raw, unformatted text of each talk into a separate file. Each file's name had the name of the speaker in it, which would be where the software would look to check its answer.

The second thing that the computer needs is to know what features in the text are important to you. The most basic approach is to ask it to note individual words in the document. For example, one feature might be "the article has the word 'commandments' in it". Another might be "the article has the word 'scriptures' in it". There are a lot more ways to look at a document than just the words. How about "this article uses the passive tense" or "this article has long sentences" or "this article references '2 Corinthians'". This simply depends on your level of effort to teach the computer what to look for.

It's not quite as involved as that, though, at the most basic level. For the program that I wrote, it simply gets all the words in each document in the training set and treats each unique word as a feature. So, it builds the feature set automatically, and we get "this article has 'humble'" as well as "this article has 'seemed'" as features.

Third, the computer needs to know what it is trying to guess at. It needs a list of possible answers.

In our case, the answers are "Eyring," "Monson," "Uchtdorf," etc.

Fourth, the computer needs to search and tally those features in the training text.

The approach that I used is called Naive-Bayesian. The idea is simple. For each feature, give each document in the training text (for which it does have the answer key) a score in one of four categories:

1. This document has word X AND it is a talk by speaker Y
2. This document doesn't have word X AND it is a talk by speaker Y
3. This document has word X AND it is not a talk by speaker Y
4. This document doesn't have word X AND it is not a talk by speaker Y

Now, with all of that tallied, we can give it text that it hasn't seen before. Given enough data to work with, even this simple approach can be very accurate.

In fact, with my corpus of the last ten years of Eyring, Hinckley, and Monson talks, my computer is 88% accurate!

Here is some nifty data that it found:




















FeatureComparisonOdds
contains(prophets) = TrueEyring : Hinckl =15.1 : 1.0
contains(wants) = TrueEyring : Hinckl =12.4 : 1.0
contains(evidence) = TrueEyring : Monson =11.6 : 1.0
contains(promised) = TrueEyring : Hinckl =11.1 : 1.0
contains(seemed) = TrueEyring : Hinckl =9.8 : 1.0
contains(qualify) = TrueEyring : Hinckl =9.8 : 1.0
contains(commandments) = TrueEyring : Hinckl =9.8 : 1.0
contains(start) = TrueEyring : Monson = 9.1 : 1.0
contains(commandments) = FalseHinckl : Eyring =8.6 : 1.0
contains(answers) = TrueEyring : Hinckl =8.5 : 1.0
contains(gifts) = TrueEyring : Hinckl =8.5 : 1.0
contains(lifetime) = TrueEyring : Hinckl =8.5 : 1.0
contains(chose) = TrueEyring : Hinckl =8.5 : 1.0
contains(simple) = TrueEyring : Hinckl =8.3 : 1.0
contains(memory) = TrueEyring : Monson =7.9 : 1.0
contains(whatsoever) = TrueEyring : Monson =7.9 : 1.0
contains(resurrected) = TrueEyring : Monson =7.9 : 1.0


This table shows what the computer found to be the most helpful features, in our case 'words', in determining who gave the talk. In the first column is the feature. The second column is the two speakers, A:B, it is comparing and the third column is the odds of it being speaker A over speaker B, given that the feature is satisfied.

(Note to self: Do not show this to brother-in-law lest he decide that he can use these odds to place bets on the April general conference address.)

What do you see that's interesting to you? I think that it's really interesting to see "basic" words mixed in with religious terms. I think that it's also interesting to see most of the odds involve President Eyring. While I haven't taken it so far, yet, I would guess that this simply means that Eyring's vocabulary is more easily distinguished from the other two. That said, on my first go-round, I just used President Monson and President Hinckley's talks, and I was still at about 85% accuracy.



Enhanced by Zemanta

Comments

Popular posts from this blog

How Many Teeth Does The Tooth Fairy Pick Up Each Night in Utah?

Somebody asked me a question about my Tooth Fairy post the other day that got me thinking. How many baby teeth are lost every day in Utah?

I began with Googling. Surely someone else has thought of this and run some numbers, right? Lo, there is a tooth fairy site that claims that the Tooth Fairy collects 300,000 teeth per night.

That's a lot; however, when I ran the numbers, it started to seem awfully low.

Let's assume that the Tooth Fairy collects all baby teeth regardless of quality and we assume that all children lose all their baby teeth. The world population of children sits at 2.2 billion, with 74.2 million of them in the United States. Of those, approximately 896,961 of them are in Utah. This means that somewhere around .04077% of the world's children are in Utah.

If we assume that kids in Utah lose teeth at the same rate as all other children in the world and that each day in the year is just as likely as the rest to lose a tooth, then we have that of the alleged …

Five Reasons Serving on the Athlos Board is Fun Right Now

About 18 months ago, a friend of mine, Bethany Zeyer, let me know about an open position on the Athlos Academy of Utah school board. I've always had a passion for education, and my kids' school seemed like a place where I could have a positive effect on the community.

Also, I'd just finished reading "The Smartest Kids in the World" by Amanda Ripley and, based on Amanda's advice, interviewed the school's director.

I was in the mood to contribute!


I applied and was accepted, and I've been serving on the board for a little over a year now.

Since then, I've learned a whole lot about how a school is run.

I've learned that someone needs to determine the school guidelines for pesticide usage.



And that someone needs to be thinking about the long-term future of the school, whether or not to increase grade capacity, whether or not to match the pay increases big school districts are giving, and most importantly, evaluate whether or not the school is achi…

Twas the Night Before Pi Day

Twas the Night Before Pi Day
by Joshua Cummings

Twas the night before Pi Day, when Archimedes, the muse,
Went to pay me a visit whilst I took a snooze.

I'd visions of carrot cake, candy, and cheese
When dashed open my window and entered a breeze
That stirred me to consciousness, albeit in time
To see my face plastered in pie of key lime.

And once I'd removed the fruit from my eyes
And put on my spectacles did I realize
That before me presented a most divine spectre
Who clearly possessed the Key Lime Projector.

"It's a fulcrum, you see!" he began to explain,
"All I use is this crank to cause the right strain,
"Then releasing its fetter it launches sky high
"The juiciest pie of key lime in your eye!"

I sat there immobile for what seemed a year,
As the spectre protested I his genius revere,
When clearly it came, the fine revelation,
Of his piety, honor, achievements, and station.
With his little old catapult, so lively and quick,
I knew in a moment that this must be …