uk.ac.cam.juliet.twitter.analysis
Class BayesClassifier

java.lang.Object
  extended by uk.ac.cam.juliet.twitter.analysis.BayesClassifier

public class BayesClassifier
extends java.lang.Object

performs sentiment analysis on tweets using the naive Bayes algorithm

To use this class:
After you have trained the algorithm using the BayesLeaner, simply pass your tweet (Status object) to getSentimentScore(Status), and it will calculate and return sentiment score

Author:
Unknown
See Also:
BayesLearner

Field Summary
(package private)  IDatabase db
          the database from which to get the hypothesis and other information
(package private)  BayesUtil util
          the BayesUtil object that provides the text processing and other utilities needed by the BayesClassifier class
 
Constructor Summary
BayesClassifier(IDatabase db)
          class constructor
 
Method Summary
 double calculateScore(java.util.List<java.lang.String> features)
          calculates from the list of features, the probability of the tweet being positive (p) and negative (n) and then returns the following value p / (p + n) which is a value between 0.0 and 1.0
 java.util.List<java.lang.String> extractFeatures(java.lang.String processedSentence)
          extracts the words in the sentence that are considered features, to decide whether or not a certain word is a feature, this function uses the chi-squared method of measuring independence between the word and the different classes.
Note: the input sentence has to be already processed
 double getConditionalProbability(java.lang.String feature, Classification c)
          calculates the mathematical expression
 double getSentimentScore(java.lang.String text)
          calculates a sentiment score of the text between 0.0 and 1.0 the closer the score is to 1.0 the more positive the text is expected to be and vice versa
 double getSentimentScore(Tweet tweet)
          calculates a sentiment score of the tweet between 0.0 and 1.0 the closer the score is to 1.0 the more positive the tweet is expected to be and vice versa
 double isExempted(Tweet tweet)
          determines whether or not the tweet should be exempted from going through the entire classification procedure reasons for exemption can be the trusted origin of the tweet
static void main(java.lang.String[] args)
          main method for testing
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

db

IDatabase db
the database from which to get the hypothesis and other information


util

BayesUtil util
the BayesUtil object that provides the text processing and other utilities needed by the BayesClassifier class

Constructor Detail

BayesClassifier

public BayesClassifier(IDatabase db)
class constructor

Parameters:
db - the database from which to get the hypothesis and other information
Method Detail

getSentimentScore

public double getSentimentScore(Tweet tweet)
calculates a sentiment score of the tweet between 0.0 and 1.0 the closer the score is to 1.0 the more positive the tweet is expected to be and vice versa

Parameters:
tweet - the tweet to analyse
Returns:
the sentiment score calculated

getSentimentScore

public double getSentimentScore(java.lang.String text)
calculates a sentiment score of the text between 0.0 and 1.0 the closer the score is to 1.0 the more positive the text is expected to be and vice versa

Parameters:
text - the text to analyse
Returns:
the sentiment score calculated

isExempted

public double isExempted(Tweet tweet)
determines whether or not the tweet should be exempted from going through the entire classification procedure reasons for exemption can be the trusted origin of the tweet

Parameters:
tweet - the tweet in question
Returns:
a score value between 0.0 and 1.0 if the tweet is exempted and a negative value otherwise

calculateScore

public double calculateScore(java.util.List<java.lang.String> features)
calculates from the list of features, the probability of the tweet being positive (p) and negative (n) and then returns the following value p / (p + n) which is a value between 0.0 and 1.0

Parameters:
features - the list of string features of the tweet (including username and important words and digrams in the text)
Returns:
a value between 0.0 and 1.0 (the closer the value is to 1.0 the more positive the tweet is expected to be and vice versa)

getConditionalProbability

public double getConditionalProbability(java.lang.String feature,
                                        Classification c)
calculates the mathematical expression

- Probability(feature|classification)

in other words, it calculates the probability that this feature will exists in the tweet given the tweet belongs to this classification (either positive or negative) this probability will be calculated from any previously provided training data.

Parameters:
feature - a string username, word or digram that is a feature (attribute) of the tweet
c - positive or negative, this is required for the conditional probabilty
Returns:
the conditional probability

extractFeatures

public java.util.List<java.lang.String> extractFeatures(java.lang.String processedSentence)
extracts the words in the sentence that are considered features, to decide whether or not a certain word is a feature, this function uses the chi-squared method of measuring independence between the word and the different classes.
Note: the input sentence has to be already processed

Parameters:
processedSentence - the sentence from which to extract the features
Returns:
a list of words that are regarded as features

main

public static void main(java.lang.String[] args)
                 throws java.sql.SQLException,
                        java.lang.ClassNotFoundException
main method for testing

Parameters:
args -
Throws:
java.lang.ClassNotFoundException
java.sql.SQLException