Automatically Classifying the Mood of Blog Posts

Here’s one for your reading list.

Hardly a week goes by that we don’t read about new services that use blogs and the blogosphere to help measure what people are talking about (ie. buzz). Today, I came across a new paper by Gilad Mishne, a grad student at the University of Amsterdam. It reports on exploratory research to automatically (using machine learning) classify blog text based on the mood of the author.

Title: Experiments with Mood Classification in Blog Posts (PDF; 8 pages)

From the abstract:

We present preliminary work on classifying blog text according to the mood reported by its author during the writing. Our data consists of a large collection of blog posts ? online diary entries ? which include an indication of the writer?s mood. We obtain modest, but consistent improvements over a baseline; our results show that further increasing the amount of available training data will lead to an additional increase in accuracy. Additionally, we show that the classification accuracy, although low, is not substantially worse than human performance on the same task. Our main finding is that mood classification is a challenging task using current text analysis methods.

As the author goes on to point out:

Mood classification is useful for various applications…[and] can enable new textual access approaches, e.g., filtering search results by mood, identifying communities, clustering, and so on.

If you’re interested in blog search and/or mining the blogosphere, this paper is worth a look. It also has an excellent bibliography.

“Experiments with Mood Classification in Blog Posts,” was presented at the SIGIR conference earlier this month.