keywords_extract_ module
- keywords_extract_.apply_hashtags(df)
It takes a dataframe as input, and returns a dataframe with a new column called ‘hashtags’ that contains a list of hashtags for each tweet
- Parameters
df – the dataframe
- Returns
A dataframe with a new column called ‘hashtags’
- keywords_extract_.get_hashtags(text)
Extracts hashtags from text after performing text preprocessing like lowercasing, punctuation removal etc.
- Parameters
text – the text you want to extract hashtags from
- Returns
A list of hashtags
- keywords_extract_.get_ht_dict(df)
For each brand, get the hashtags from the twitter and reddit posts
- Parameters
df – the dataframe containing the data
- Returns
A dictionary of hashtags for each brand and stream
- keywords_extract_.get_kw_kbnc_dict(df, kw_model, noun_chunks)
For each brand, extract the top 20 KeyBERT keywords from the text of each stream (Twitter and Reddit) using the noun chunks as candidates
- Parameters
df – the dataframe with the text data
kw_model – the keyword extractor model
noun_chunks – a dictionary of noun chunks for each brand and stream
- Returns
A dictionary of keyBERT keywords for each brand and stream
- keywords_extract_.get_kw_yake_dict(df)
For each brand, get the top 20 keywords from the YAKE keyword extractor for both Twitter and Reddit
- Parameters
df – dataframe with the text data
- Returns
A dictionary of YAKE keywords for each brand and stream
- keywords_extract_.get_noun_chunks_dict(df)
Extract the SpaCy noun chunks from the cleaned text posts
- Parameters
df – the dataframe containing the cleaned text data
- Returns
A dictionary of SpaCy noun chunks for each brand and stream
- keywords_extract_.load_sentiment_emotion_data(path)
It takes a path to a directory of csv files, reads them in, and drops and renames some columns
- Parameters
path – the path to the folder containing the csv files for each brand
- Returns
An appended dataframe with the preprocessed columns
- keywords_extract_.spacy_noun_chunks(doc_text, model='en_core_web_lg')
Extracts SpaCy noun chunks from a given preprocessed document text.
Let’s try it out on the first document in the corpus
- Parameters
doc_text – the text you want to extract noun chunks from
model – the model to use for the NLP. Defaults to en_core_web_lg
- Returns
A list of noun chunks
- keywords_extract_.text_clean_yake(text)
Cleans the text by removing mentions, hashtags, weblinks, websites, punctuation, non alpha numeric, and additional whitespaces
- Parameters
text – The text to be processed.
- Returns
A cleaned text