messageanalyzer.extract_keywords

Functions

extract_keywords(→ List[List[str]])

Extracts the top keywords from a list of text messages using TF-IDF (Term Frequency-Inverse Document Frequency).

Module Contents

messageanalyzer.extract_keywords.extract_keywords(messages: List[str], num_keywords: int = 5) List[List[str]][source]

Extracts the top keywords from a list of text messages using TF-IDF (Term Frequency-Inverse Document Frequency).

This function applies TF-IDF to determine the most important words in each message based on their relative importance in the given text corpus. Stop words are automatically removed.

Parameters:
  • messages (List[str]) – A list of text messages from which to extract keywords.

  • num_keywords (int, default = 5) – The number of top keywords to extract from each message.

Raises:

TypeError – If messages is not a list or contains non-string elements.

Returns:

A list where each sublist contains the top extracted keywords from the corresponding message.

Return type:

List[List[str]]

Examples

>>> messages = ["Learning Data Science at MDS is amazing!", "I prefer to work with Python than R"]
>>> extract_keywords(messages, num_keywords=3)
[['data', 'science', 'amazing'], ['python', 'prefer', 'work']]