Sentiment_model module
- Sentiment_model.appendData()
It takes the data from the three datasets, cleans the text, and then uses the sentiment model to predict the sentiment of each tweet
- Returns
the sentiment prediction and the sentiment probability for each row in the dataframe.
- Sentiment_model.clean_text(text)
It takes a string as input, and returns a string with all the non-alphanumeric characters removed, and all the words converted to lowercase
- Parameters
text – The text to be cleaned
- Returns
A string
- Sentiment_model.getData(twitterFname, redditFname)
It reads in the two data files, combines them, and then splits them into training and test sets
- Parameters
twitterFname – the name of the file containing the twitter data
redditFname – The name of the file containing the Reddit data.
- Returns
train_X, test_X, train_y, test_y
- Sentiment_model.modelPredict(clf_svm, vectorizer)
> The function takes in a trained model and a vectorizer and returns the prediction of the model on the text ‘How about this text?’
- Parameters
clf_svm – the trained model
vectorizer – This is the vectorizer that we trained on our training data.
- Returns
The model is returning the predicted class of the input text.
- Sentiment_model.train_SVM(train_X_vectors, train_y, test_X_vectors, test_y)
It takes in the training and testing data, and returns a trained SVM model
- Parameters
train_X_vectors – The training data
train_y – the labels for the training data
test_X_vectors – the vectorized test data
test_y – the actual labels of the test set
- Returns
the trained model.
- Sentiment_model.train_vectorizer(train_X, test_X)
It takes in a list of strings (train_X) and a list of strings (test_X) and returns a list of vectors (train_X_vectors) and a list of vectors (test_X_vectors)
- Parameters
train_X – the training data
test_X – The test data
- Returns
The vectorizer is being returned.