messageanalyzer.detect_language_patterns

Functions

detect_language_patterns(→ Union[List[str], ...)

Detects language patterns in a list of messages.

Module Contents

messageanalyzer.detect_language_patterns.detect_language_patterns(messages: List[str], method: str = 'language', n: int = 2, top_n: int = 5) List[str] | List[Tuple[str, int]][source]

Detects language patterns in a list of messages.

Parameters:
  • messages (List[str]) – A list of text messages to analyze.

  • method (str, default = "language") – The method to use for pattern detection. Supported methods are: - “language”: Detects the language of each message. - “ngrams”: Extracts common n-grams. - “char_patterns”: Analyzes common character patterns.

  • n (int, default = 2) – The ‘n’ in n-grams, used when method=”ngrams”.

  • top_n (int, default = 5) – The number of top patterns to return.

Returns:

A list of detected patterns based on the chosen method: - For “language”, a list of detected languages (e.g., [‘en’, ‘fr’]). - For “ngrams”, a list of tuples (ngram, frequency). - For “char_patterns”, a list of tuples (character, frequency).

Return type:

Union[List[str], List[Tuple[str, int]]]

Raises:
  • TypeError – If messages is not a list of strings.

  • ValueError – If method is unsupported.

Examples

>>> messages = ["Hello, how are you?", "Bonjour, comment ça va?", "Hola, ¿cómo estás?"]

Example 1: Detecting languages >>> detect_language_patterns(messages, method=”language”) [‘en’, ‘fr’, ‘es’] # English, French, Spanish

Example 2: Extracting common 2-grams >>> detect_language_patterns(messages, method=”ngrams”, n=2, top_n=5) [(‘how are’, 1), (‘are you’, 1), (‘comment ça’, 1), (‘ça va’, 1), (‘cómo estás’, 1)]

Example 3: Analyzing common character patterns >>> detect_language_patterns(messages, method=”char_patterns”, top_n=5) [(’ ‘, 8), (‘o’, 7), (‘e’, 6), (‘a’, 5), (‘m’, 3)]