Sentiment_model module

Sentiment_model.appendData()

It takes the data from the three datasets, cleans the text, and then uses the sentiment model to predict the sentiment of each tweet

Returns: the sentiment prediction and the sentiment probability for each row in the dataframe.

Sentiment_model.clean_text(text)

It takes a string as input, and returns a string with all the non-alphanumeric characters removed, and all the words converted to lowercase

Parameters: text – The text to be cleaned
Returns: A string

Sentiment_model.getData(twitterFname, redditFname)

It reads in the two data files, combines them, and then splits them into training and test sets

Parameters

twitterFname – the name of the file containing the twitter data
redditFname – The name of the file containing the Reddit data.

Returns

train_X, test_X, train_y, test_y

Sentiment_model.modelPredict(clf_svm, vectorizer)

> The function takes in a trained model and a vectorizer and returns the prediction of the model on the text ‘How about this text?’

Parameters

clf_svm – the trained model
vectorizer – This is the vectorizer that we trained on our training data.

Returns

The model is returning the predicted class of the input text.

Sentiment_model.train_SVM(train_X_vectors, train_y, test_X_vectors, test_y)

It takes in the training and testing data, and returns a trained SVM model

Parameters

train_X_vectors – The training data
train_y – the labels for the training data
test_X_vectors – the vectorized test data
test_y – the actual labels of the test set

Returns

the trained model.

Sentiment_model.train_vectorizer(train_X, test_X)

It takes in a list of strings (train_X) and a list of strings (test_X) and returns a list of vectors (train_X_vectors) and a list of vectors (test_X_vectors)

Parameters

train_X – the training data
test_X – The test data

Returns

The vectorizer is being returned.