Sentiment_model module

Sentiment_model.appendData()

It takes the data from the three datasets, cleans the text, and then uses the sentiment model to predict the sentiment of each tweet

Returns

the sentiment prediction and the sentiment probability for each row in the dataframe.

Sentiment_model.clean_text(text)

It takes a string as input, and returns a string with all the non-alphanumeric characters removed, and all the words converted to lowercase

Parameters

text – The text to be cleaned

Returns

A string

Sentiment_model.getData(twitterFname, redditFname)

It reads in the two data files, combines them, and then splits them into training and test sets

Parameters
  • twitterFname – the name of the file containing the twitter data

  • redditFname – The name of the file containing the Reddit data.

Returns

train_X, test_X, train_y, test_y

Sentiment_model.modelPredict(clf_svm, vectorizer)

> The function takes in a trained model and a vectorizer and returns the prediction of the model on the text ‘How about this text?’

Parameters
  • clf_svm – the trained model

  • vectorizer – This is the vectorizer that we trained on our training data.

Returns

The model is returning the predicted class of the input text.

Sentiment_model.train_SVM(train_X_vectors, train_y, test_X_vectors, test_y)

It takes in the training and testing data, and returns a trained SVM model

Parameters
  • train_X_vectors – The training data

  • train_y – the labels for the training data

  • test_X_vectors – the vectorized test data

  • test_y – the actual labels of the test set

Returns

the trained model.

Sentiment_model.train_vectorizer(train_X, test_X)

It takes in a list of strings (train_X) and a list of strings (test_X) and returns a list of vectors (train_X_vectors) and a list of vectors (test_X_vectors)

Parameters
  • train_X – the training data

  • test_X – The test data

Returns

The vectorizer is being returned.