Skip to main content

LDS General Conference: The Software Guesses the Speaker

Mormon Tabernacle Choir general conference - A...Image via WikipediaLet's say that you stripped the author and photo from the online transcript of any given LDS General Conference talk. Let's also say that you didn't see or hear that talk delivered. Could you tell who gave the talk just by reading it?

My computer can. Here's how:

It turns out that modern text processing software is getting pretty good at this kind of stuff. You've always known that your inbox can tell whether or not a random email is spam with a fair amount of accuracy. Have you used cutting-edge software like Zemanta, though, that classifies and categorizes your blog post as you are typing it? Or how about companies that have software that preprocesses incoming customer feedback emails to decide whether you are happy with the product or not?

Even the most basic approaches can be very precise in some domains; this example being LDS General Conference talks.

The first thing that the computer needs is a set of training text. This text is like giving the computer the test questions and the answer key, which the computer can use to try to learn the subject matter. While "teaching to the test" might not be the best for our student population, it works great for computers that can make appropriate inferences from the smallest details.

The way I got my training text was to use a web crawler that would go to www.lds.org, download general conference talks, scrape off all the HTML, etc, and save the raw, unformatted text of each talk into a separate file. Each file's name had the name of the speaker in it, which would be where the software would look to check its answer.

The second thing that the computer needs is to know what features in the text are important to you. The most basic approach is to ask it to note individual words in the document. For example, one feature might be "the article has the word 'commandments' in it". Another might be "the article has the word 'scriptures' in it". There are a lot more ways to look at a document than just the words. How about "this article uses the passive tense" or "this article has long sentences" or "this article references '2 Corinthians'". This simply depends on your level of effort to teach the computer what to look for.

It's not quite as involved as that, though, at the most basic level. For the program that I wrote, it simply gets all the words in each document in the training set and treats each unique word as a feature. So, it builds the feature set automatically, and we get "this article has 'humble'" as well as "this article has 'seemed'" as features.

Third, the computer needs to know what it is trying to guess at. It needs a list of possible answers.

In our case, the answers are "Eyring," "Monson," "Uchtdorf," etc.

Fourth, the computer needs to search and tally those features in the training text.

The approach that I used is called Naive-Bayesian. The idea is simple. For each feature, give each document in the training text (for which it does have the answer key) a score in one of four categories:

1. This document has word X AND it is a talk by speaker Y
2. This document doesn't have word X AND it is a talk by speaker Y
3. This document has word X AND it is not a talk by speaker Y
4. This document doesn't have word X AND it is not a talk by speaker Y

Now, with all of that tallied, we can give it text that it hasn't seen before. Given enough data to work with, even this simple approach can be very accurate.

In fact, with my corpus of the last ten years of Eyring, Hinckley, and Monson talks, my computer is 88% accurate!

Here is some nifty data that it found:




















FeatureComparisonOdds
contains(prophets) = TrueEyring : Hinckl =15.1 : 1.0
contains(wants) = TrueEyring : Hinckl =12.4 : 1.0
contains(evidence) = TrueEyring : Monson =11.6 : 1.0
contains(promised) = TrueEyring : Hinckl =11.1 : 1.0
contains(seemed) = TrueEyring : Hinckl =9.8 : 1.0
contains(qualify) = TrueEyring : Hinckl =9.8 : 1.0
contains(commandments) = TrueEyring : Hinckl =9.8 : 1.0
contains(start) = TrueEyring : Monson = 9.1 : 1.0
contains(commandments) = FalseHinckl : Eyring =8.6 : 1.0
contains(answers) = TrueEyring : Hinckl =8.5 : 1.0
contains(gifts) = TrueEyring : Hinckl =8.5 : 1.0
contains(lifetime) = TrueEyring : Hinckl =8.5 : 1.0
contains(chose) = TrueEyring : Hinckl =8.5 : 1.0
contains(simple) = TrueEyring : Hinckl =8.3 : 1.0
contains(memory) = TrueEyring : Monson =7.9 : 1.0
contains(whatsoever) = TrueEyring : Monson =7.9 : 1.0
contains(resurrected) = TrueEyring : Monson =7.9 : 1.0


This table shows what the computer found to be the most helpful features, in our case 'words', in determining who gave the talk. In the first column is the feature. The second column is the two speakers, A:B, it is comparing and the third column is the odds of it being speaker A over speaker B, given that the feature is satisfied.

(Note to self: Do not show this to brother-in-law lest he decide that he can use these odds to place bets on the April general conference address.)

What do you see that's interesting to you? I think that it's really interesting to see "basic" words mixed in with religious terms. I think that it's also interesting to see most of the odds involve President Eyring. While I haven't taken it so far, yet, I would guess that this simply means that Eyring's vocabulary is more easily distinguished from the other two. That said, on my first go-round, I just used President Monson and President Hinckley's talks, and I was still at about 85% accuracy.



Enhanced by Zemanta

Comments

Popular posts from this blog

How Many Teeth Does The Tooth Fairy Pick Up Each Night in Utah?

Somebody asked me a question about my Tooth Fairy post the other day that got me thinking. How many baby teeth are lost every day in Utah?

I began with Googling. Surely someone else has thought of this and run some numbers, right? Lo, there is a tooth fairy site that claims that the Tooth Fairy collects 300,000 teeth per night.

That's a lot; however, when I ran the numbers, it started to seem awfully low.

Let's assume that the Tooth Fairy collects all baby teeth regardless of quality and we assume that all children lose all their baby teeth. The world population of children sits at 2.2 billion, with 74.2 million of them in the United States. Of those, approximately 896,961 of them are in Utah. This means that somewhere around .04077% of the world's children are in Utah.

If we assume that kids in Utah lose teeth at the same rate as all other children in the world and that each day in the year is just as likely as the rest to lose a tooth, then we have that of the alleged …

I don't know you from Adam OR How to Tie Yourself Back to Adam in 150 Easy Steps

Last Sunday, I was working on my genealogy on familysearch.org, a free site provided by The Church of Jesus Christ of Latter-Day Saints for doing pretty extensive family history. While looking for information about a Thomas Neal, I found an individual who had done a bunch of work on his family including is tie into the Garland family, which tied in through Thomas's wife.

So, while I was pondering what to do about Thomas Neal (who's parents I still haven't found), I clicked up the Garland line. It was pretty cool because it went really far back; it's always fun to see that there were real people who you are really related to back in the 14th century or what not.
As I worked my way back through the tree, I noticed it dead-ended at Sir Thomas Morieux, who, according to the chart, was the maternal grandfather-in-law of Humphy Garland (b. 1376).  The name sounded pretty official, so I thought I'd Google him. I learned from Wikipedia that Sir Thomas Morieux married Blanc…

Twas the Night Before Pi Day

Twas the Night Before Pi Day
by Joshua Cummings

Twas the night before Pi Day, when Archimedes, the muse,
Went to pay me a visit whilst I took a snooze.

I'd visions of carrot cake, candy, and cheese
When dashed open my window and entered a breeze
That stirred me to consciousness, albeit in time
To see my face plastered in pie of key lime.

And once I'd removed the fruit from my eyes
And put on my spectacles did I realize
That before me presented a most divine spectre
Who clearly possessed the Key Lime Projector.

"It's a fulcrum, you see!" he began to explain,
"All I use is this crank to cause the right strain,
"Then releasing its fetter it launches sky high
"The juiciest pie of key lime in your eye!"

I sat there immobile for what seemed a year,
As the spectre protested I his genius revere,
When clearly it came, the fine revelation,
Of his piety, honor, achievements, and station.
With his little old catapult, so lively and quick,
I knew in a moment that this must be …