goes over every tweet in the raw_tweets table and if any tweet contains
that regular expression, it is marked as nsfw in the analysed_tweets table.
for example: if the input is "belguim", it will match every tweet in raw_tweet
to ".*\bbelgium\b.*" and if it matches, it gets marked as nsfw
reloads the bad words from database into the memory cache,
since it is expensive to load badwords from the database every time
is called, we delegate to the caller the job of reloading
the badwords from the database Note: this method is thread safe
repeated letters cause a problem to the Bayes algorithm,
this function takes a word like "huuuuuungrrrry" and
reduces every string of repeated letters to 2 letters
only, the result in this case will be "huungrry", this
reduces the number of different ways the word "hungry"
can be written