keywords_extract_ module

keywords_extract_.apply_hashtags(df)

It takes a dataframe as input, and returns a dataframe with a new column called ‘hashtags’ that contains a list of hashtags for each tweet

keywords_extract_.get_hashtags(text)

Extracts hashtags from text after performing text preprocessing like lowercasing, punctuation removal etc.

keywords_extract_.get_ht_dict(df)

For each brand, get the hashtags from the twitter and reddit posts

keywords_extract_.get_kw_kbnc_dict(df, kw_model, noun_chunks)

For each brand, extract the top 20 KeyBERT keywords from the text of each stream (Twitter and Reddit) using the noun chunks as candidates

Parameters

Returns

A dictionary of keyBERT keywords for each brand and stream

keywords_extract_.get_kw_yake_dict(df)

For each brand, get the top 20 keywords from the YAKE keyword extractor for both Twitter and Reddit

keywords_extract_.get_noun_chunks_dict(df)

Extract the SpaCy noun chunks from the cleaned text posts

keywords_extract_.load_sentiment_emotion_data(path)

It takes a path to a directory of csv files, reads them in, and drops and renames some columns

Parameters: path – the path to the folder containing the csv files for each brand
Returns: An appended dataframe with the preprocessed columns

keywords_extract_.spacy_noun_chunks(doc_text, model='en_core_web_lg')

Extracts SpaCy noun chunks from a given preprocessed document text.

Let’s try it out on the first document in the corpus

Parameters

Returns

A list of noun chunks

keywords_extract_.text_clean_yake(text)

Cleans the text by removing mentions, hashtags, weblinks, websites, punctuation, non alpha numeric, and additional whitespaces