messageanalyzer.topic_modeling

Functions

topic_modeling(→ Dict[str, List[str]])

Perform topic modeling using Non-negative Matrix Factorization (NMF).

Module Contents

messageanalyzer.topic_modeling.topic_modeling(messages: List[str], n_topics: int = 5, n_words: int = 10, random_state: int = 123) → Dict[str, List[str]][source]

Perform topic modeling using Non-negative Matrix Factorization (NMF).

Parameters:

messages (List[str]) – List of messages for topic modeling.
n_topics (int, optional) – Number of topics to extract, by default 5.
n_words (int, optional) – Number of top words to display per topic, by default 10.
random_state (int, optional) – Random seed for reproducibility, by default 123.

Returns:

A dictionary where each key is a topic label (e.g., “Topic 1”) and each value is a list of the most representative words for that topic.

Return type:

Dict[str, List[str]]

Raises:

TypeError – If messages is not a list of strings.

Examples

>>> messages = ["Learning Data science at MDS is amazing!", "I prefer to work with Python than R"]
>>> topic_modeling(messages, n_topics = 3, n_words = 3)
{'Topic 1': ['mds', 'science', 'learning'], 'Topic 2': ['work', 'python', 'prefer'], 'Topic 3': ['amazing', 'data', 'learning']}