messageanalyzer.detect_language_patterns
Functions
|
Detects language patterns in a list of messages. |
Module Contents
- messageanalyzer.detect_language_patterns.detect_language_patterns(messages: List[str], method: str = 'language', n: int = 2, top_n: int = 5) List[str] | List[Tuple[str, int]][source]
Detects language patterns in a list of messages.
- Parameters:
messages (List[str]) – A list of text messages to analyze.
method (str, default = "language") – The method to use for pattern detection. Supported methods are: - “language”: Detects the language of each message. - “ngrams”: Extracts common n-grams. - “char_patterns”: Analyzes common character patterns.
n (int, default = 2) – The ‘n’ in n-grams, used when method=”ngrams”.
top_n (int, default = 5) – The number of top patterns to return.
- Returns:
A list of detected patterns based on the chosen method: - For “language”, a list of detected languages (e.g., [‘en’, ‘fr’]). - For “ngrams”, a list of tuples (ngram, frequency). - For “char_patterns”, a list of tuples (character, frequency).
- Return type:
Union[List[str], List[Tuple[str, int]]]
- Raises:
TypeError – If messages is not a list of strings.
ValueError – If method is unsupported.
Examples
>>> messages = ["Hello, how are you?", "Bonjour, comment ça va?", "Hola, ¿cómo estás?"]
Example 1: Detecting languages >>> detect_language_patterns(messages, method=”language”) [‘en’, ‘fr’, ‘es’] # English, French, Spanish
Example 2: Extracting common 2-grams >>> detect_language_patterns(messages, method=”ngrams”, n=2, top_n=5) [(‘how are’, 1), (‘are you’, 1), (‘comment ça’, 1), (‘ça va’, 1), (‘cómo estás’, 1)]
Example 3: Analyzing common character patterns >>> detect_language_patterns(messages, method=”char_patterns”, top_n=5) [(’ ‘, 8), (‘o’, 7), (‘e’, 6), (‘a’, 5), (‘m’, 3)]