uk.ac.cam.juliet.twitter.database
Interface IDatabase

All Known Implementing Classes:
Database, MockDatabase

public interface IDatabase

Database interface, this should be implemented by MockDatabase for unit testing

Author:
Ahmad Akra

Method Summary
 boolean addBadWord(java.lang.String badword)
          inserts the bad word in the badwords table with a default replacement of "[Censored]", if the badwords id already in the table the methods does nothing and returns false, otherwise returns true
 boolean addBadWord(java.lang.String badword, java.lang.String replacement)
          inserts badword in the badwords table alongside its clean replacement, if the badword is already in the table, the method does nothing and returns false, otherwise it returns true.
 boolean addRawTweet(Tweet tweet)
          attempts to insert the raw (every field; all data, before analysis) tweet into the raw_tweets table in the database, if the tweet is already in the database the method does nothing and returns false.
When a tweet is inserted, it is marked as under_analysis by default, which implies that the database will be considered inconsistent until the finishedAnalysis(Status) is called on the same tweet to un-mark it.
 void addStopWord(java.lang.String stopword)
          adds the argument to the list of stop words in the database
 boolean addUser(Tweet tweet)
          attempts to add the user to the Users table in the database if they do not already exist in it
 void clearAllAnalysis()
          deletes all rows from the analysis tables: analysed_tweets and word_count
 IDatabase copy()
          creates a new IDatabase object connected to the same database, username and password
 void createTables()
          attempts to create the required tables in the database if they are not already created.
 void decrementCount(java.lang.String word, int decrementValue)
          decrements the count of word in the word_count table by amount decrementValue, if the word is not found in the table, the function does nothing and return
 void deleteBadWord(java.lang.String badword)
          deletes this badword from the badwords table, if the badword was not found, the method returns with no side effects
 void finishedAnalysis(Tweet tweet)
          marks the parameter tweet in the database as completely analysed meaning that the analysis and statistics data is consistent with the raw_tweets table
 void flushDatabase()
          drops all the client specific tables from the database
Warning: this methods is irreversable, and causes all the client specific tables to be lost
 java.util.List<java.lang.String> getAllBadWords()
          reads all the badword regular expressions from the database
 java.util.List<java.lang.String> getAllStopwords()
          gets the list of all stopwords in the database
 double getChiSquared(java.lang.String word, Classification c)
          calculates the chi-square function which measures the independence of the class c of the parameter word , the lower the score the more independent is that word from this class
for good explaination of the chi-squared function check the 6th page of this document from Stanford university
 int getClassificationCount(Classification c)
          counts the number of tweets in the training data that belong to classification c
 int getCountPerClassification(java.lang.String feature, Classification c)
          counts the number of tweets in the training data that have the feature "feature" and have classification "c"
 int getNumberOfDays()
          gets the number of days to keep old tweets in the database
 java.util.List<Tweet> getRawTweets(java.util.Date olderThan)
          gets from the database a list of all tweets that are created before then the specified time
 java.lang.String getSearchString()
          gets the search string that is stored in the database
 int getTrainingDataCount()
          counts the training data in the database
 java.lang.String getTwitterUsernameAndPass()
          gets the twitter username and password used for authentication from the database
 int getWordCount(java.lang.String word)
          gets the current count of the word in the word_count table in the database or 0 if the word is not present in the database.
 boolean hasCrashed()
          finds if any raw tweets are still marked as "under analysis"
 void incrementChiSquareWordCount(java.lang.String word, Classification c)
          inserts (word,0,0) in the table if word is not in the table already, and increments either n_p if c = positive or n_n of c = negative
 void incrementCount(java.lang.String word, int incrementValue)
          increments the count of word in the word_count table by amount incrementValue, if the word is not already in the table it is inserted with an initial count of incrementValue
 boolean insertAnalysis(java.math.BigInteger id, boolean isOffensive, double score)
          attempts inserts the tweet with identifer "id" into the analysed_tweets table with some analysis values, if the tweet id is already in the database, method does nothing and returns false
 boolean insertTrainingTweet(Tweet tweet, Classification c)
          attempts to insert the given tweet in the database alongside its classification, if the insertion fails, the method doesn't do anything
 boolean isEmpty()
          checks if the database has any tables in it
 void refilterTweets(java.lang.String badRegex)
          goes over every tweet in the raw_tweets table and if any tweet contains that regular expression, it is marked as nsfw in the analysed_tweets table.
for example: if the input is "belguim", it will match every tweet in raw_tweet to ".*\bbelgium\b.*" and if it matches, it gets marked as nsfw
 void removeAnalysis(Tweet tweet)
          deletes from the analysed_tweets table the analysis of the argument tweet
 void removeInactiveUsers()
          removes from the users table every user who doesn't have a refernece in the raw_tweets table
 void removeInsignificantWords()
          after cleanup, remove all word_counts where count <= 1
 void removeRawTweets(java.util.Date olderThen)
          deletes all raw_tweets that are older then the provided time
 void storeLastNumberOfDays(int numberOfDays)
          stores in the database the number of days for which to keep old tweets in the database
 void storeLastSearchString(java.lang.String searchString)
          stores the search string in a special table client table
 void underAnalysis(java.util.Date olderThan)
          marks all tweets older than the "olderThan" parameter as under analysis, hence if the server crashes before they are unmarked, a crash can be detected.
 boolean updateBadWord(java.lang.String badword, java.lang.String replacement)
          updates the clean replacement of the badword in the badwords table
Warning: the old replacmenet in the database will be lost
 void updateIsOffensive(int id, boolean isOffensive)
          updates the analysed_tweets table to indicate whether or not the tweet with identifier "id" contains offensive content.
 void updateScore(int id, double score)
          updates the analysed_tweets table to set a new sentiment score for the tweet with identifier "id"
 

Method Detail

addRawTweet

boolean addRawTweet(Tweet tweet)
attempts to insert the raw (every field; all data, before analysis) tweet into the raw_tweets table in the database, if the tweet is already in the database the method does nothing and returns false.
When a tweet is inserted, it is marked as under_analysis by default, which implies that the database will be considered inconsistent until the finishedAnalysis(Status) is called on the same tweet to un-mark it.

Parameters:
tweet - is the tweet object, as fetched from the Twitter API
Returns:
true if the insert was successful and false otherwise

insertAnalysis

boolean insertAnalysis(java.math.BigInteger id,
                       boolean isOffensive,
                       double score)
attempts inserts the tweet with identifer "id" into the analysed_tweets table with some analysis values, if the tweet id is already in the database, method does nothing and returns false

Parameters:
id - the id of the tweet as returned by getRawTweetID and addRawTweet
isOffensive - a boolean value indicating whether the tweet contains offensive content
score - the sentiment analysis value between 0.0 and 1.0
Returns:
true if the insert was successful, and false otherwise

updateIsOffensive

void updateIsOffensive(int id,
                       boolean isOffensive)
updates the analysed_tweets table to indicate whether or not the tweet with identifier "id" contains offensive content.

Parameters:
id - the id of the tweet as returned by getRawTweetID and addRawTweet
isOffensive - a boolean value indicating whether the tweet contains offensive content

updateScore

void updateScore(int id,
                 double score)
updates the analysed_tweets table to set a new sentiment score for the tweet with identifier "id"

Parameters:
id - the id of the tweet as returned by getRawTweetID and addRawTweet
score - the sentiment analysis value between 0.0 and 1.0

addUser

boolean addUser(Tweet tweet)
attempts to add the user to the Users table in the database if they do not already exist in it

Parameters:
tweet -
Returns:
true if the insertion was successful, false if the user is present in the database

addBadWord

boolean addBadWord(java.lang.String badword)
inserts the bad word in the badwords table with a default replacement of "[Censored]", if the badwords id already in the table the methods does nothing and returns false, otherwise returns true

Parameters:
badword - the badword to be inserted
Returns:
true if the insert was successful, false otherwise

addBadWord

boolean addBadWord(java.lang.String badword,
                   java.lang.String replacement)
inserts badword in the badwords table alongside its clean replacement, if the badword is already in the table, the method does nothing and returns false, otherwise it returns true.

Parameters:
badword - the badword to be inserted
replacement - the clean replacement
Returns:
true if the insert was successful, false if the badword is already in the database

updateBadWord

boolean updateBadWord(java.lang.String badword,
                      java.lang.String replacement)
updates the clean replacement of the badword in the badwords table
Warning: the old replacmenet in the database will be lost

Parameters:
badword - the badword for which the replacement need to be updated
replacement - the new replacement word
Returns:
true if the update was successful and false otherwise

getAllBadWords

java.util.List<java.lang.String> getAllBadWords()
reads all the badword regular expressions from the database

Returns:
a list of all the bad regular expressions

incrementCount

void incrementCount(java.lang.String word,
                    int incrementValue)
increments the count of word in the word_count table by amount incrementValue, if the word is not already in the table it is inserted with an initial count of incrementValue

Parameters:
word - the word for which the count needs to be incremented
incrementValue - the number by which to increment the count of word

decrementCount

void decrementCount(java.lang.String word,
                    int decrementValue)
decrements the count of word in the word_count table by amount decrementValue, if the word is not found in the table, the function does nothing and return

Parameters:
word - the word for which the count needs to be decremented
decrementValue - the number by which to decrement the count of word

getWordCount

int getWordCount(java.lang.String word)
gets the current count of the word in the word_count table in the database or 0 if the word is not present in the database. This function is not case seneitive, e.g. "Hello" or "hello" return the same result

Parameters:
word - the word for which to get the count.
Returns:
the count of the word in the word_count table in the database

insertTrainingTweet

boolean insertTrainingTweet(Tweet tweet,
                            Classification c)
attempts to insert the given tweet in the database alongside its classification, if the insertion fails, the method doesn't do anything

Parameters:
tweet - the tweet to store in the database
c - the classification (Positive or Negative)
Returns:
true if the insertion was successful, false otherwise

getTrainingDataCount

int getTrainingDataCount()
counts the training data in the database

Returns:
the number of training tweets in the database

getClassificationCount

int getClassificationCount(Classification c)
counts the number of tweets in the training data that belong to classification c

Parameters:
c - the classification to count for
Returns:
the number of tweets in the training data that belong to classification c

getChiSquared

double getChiSquared(java.lang.String word,
                     Classification c)
calculates the chi-square function which measures the independence of the class c of the parameter word , the lower the score the more independent is that word from this class
for good explaination of the chi-squared function check the 6th page of this document from Stanford university

Parameters:
word - the word to measure independence against
c - the classification to measure indep
Returns:
chi-squared(word,c)

getCountPerClassification

int getCountPerClassification(java.lang.String feature,
                              Classification c)
counts the number of tweets in the training data that have the feature "feature" and have classification "c"

Parameters:
feature - the feature to be counted
c - the classification to be counted
Returns:
the number of tweets in the training data that have the feature "feature" and have classification "c"

incrementChiSquareWordCount

void incrementChiSquareWordCount(java.lang.String word,
                                 Classification c)
inserts (word,0,0) in the table if word is not in the table already, and increments either n_p if c = positive or n_n of c = negative

Parameters:
word - the word to insert/update in the table
c - the column to increment (either positive or negative)

getAllStopwords

java.util.List<java.lang.String> getAllStopwords()
gets the list of all stopwords in the database

Returns:
the list of all stopwrods

addStopWord

void addStopWord(java.lang.String stopword)
adds the argument to the list of stop words in the database

Parameters:
stopword - the stopword to add

flushDatabase

void flushDatabase()
drops all the client specific tables from the database
Warning: this methods is irreversable, and causes all the client specific tables to be lost


createTables

void createTables()
attempts to create the required tables in the database if they are not already created.


isEmpty

boolean isEmpty()
checks if the database has any tables in it

Returns:
true if the database has no tables and false otherwise

getSearchString

java.lang.String getSearchString()
gets the search string that is stored in the database

Returns:
the search string that is stored in the database

storeLastSearchString

void storeLastSearchString(java.lang.String searchString)
stores the search string in a special table client table

Parameters:
searchString - the search string to store

finishedAnalysis

void finishedAnalysis(Tweet tweet)
marks the parameter tweet in the database as completely analysed meaning that the analysis and statistics data is consistent with the raw_tweets table

Parameters:
tweet - the tweet to mark as completely analysed

removeRawTweets

void removeRawTweets(java.util.Date olderThen)
deletes all raw_tweets that are older then the provided time

Parameters:
olderThen - delete all tweets older than this time

underAnalysis

void underAnalysis(java.util.Date olderThan)
marks all tweets older than the "olderThan" parameter as under analysis, hence if the server crashes before they are unmarked, a crash can be detected.

Parameters:
olderThan - mark all tweets older than this parameter

getRawTweets

java.util.List<Tweet> getRawTweets(java.util.Date olderThan)
gets from the database a list of all tweets that are created before then the specified time

Parameters:
olderThan - get all tweets created before this time parameter
Returns:
all tweets created before this time parameter

removeAnalysis

void removeAnalysis(Tweet tweet)
deletes from the analysed_tweets table the analysis of the argument tweet

Parameters:
tweet - delete the analysis of this tweet

hasCrashed

boolean hasCrashed()
finds if any raw tweets are still marked as "under analysis"

Returns:
true if there is one or more raw tweets marked as under analysis and false if no such raw tweets exist, if connection to the database fails it also returns true

clearAllAnalysis

void clearAllAnalysis()
deletes all rows from the analysis tables: analysed_tweets and word_count


removeInactiveUsers

void removeInactiveUsers()
removes from the users table every user who doesn't have a refernece in the raw_tweets table


refilterTweets

void refilterTweets(java.lang.String badRegex)
goes over every tweet in the raw_tweets table and if any tweet contains that regular expression, it is marked as nsfw in the analysed_tweets table.
for example: if the input is "belguim", it will match every tweet in raw_tweet to ".*\bbelgium\b.*" and if it matches, it gets marked as nsfw

Parameters:
badRegex - the bad regular expression to check against

copy

IDatabase copy()
creates a new IDatabase object connected to the same database, username and password

Returns:
another IDatabase object connected to the same database, username and password

deleteBadWord

void deleteBadWord(java.lang.String badword)
deletes this badword from the badwords table, if the badword was not found, the method returns with no side effects

Parameters:
badword - the badword to delete

storeLastNumberOfDays

void storeLastNumberOfDays(int numberOfDays)
stores in the database the number of days for which to keep old tweets in the database

Parameters:
numberOfDays - the number to store

getNumberOfDays

int getNumberOfDays()
gets the number of days to keep old tweets in the database

Returns:
the number of days to keep old tweets in the database

getTwitterUsernameAndPass

java.lang.String getTwitterUsernameAndPass()
gets the twitter username and password used for authentication from the database

Returns:
the twitter username and password in one string seperated by a colon like this "username:password"

removeInsignificantWords

void removeInsignificantWords()
after cleanup, remove all word_counts where count <= 1