Post Stastics
- This post has 2937 words.
- Estimated read time is 13.99 minute(s).
Introduction
As you may know I have been absent for a several weeks due to illness. As I am still recovering, I decided to do something simple for my first project during my recovery. I wanted a simple, fun project for my readers. So I decided on a multi-lingual LeekSpeak encoder/decoder. In this post we will create this app and play around with it.
About 1337 Encoding (LeetSpeak)
1337 Speak, also known as LeetSpeak or 1337, is a form of symbolic writing in which letters are replaced with a combination of numbers, symbols, and other characters. The term “leet” is derived from the word “elite,” reflecting the language’s origins in online communities and hacker culture.
Basic LeetSpeak Conversions
Here are some common LeetSpeak substitutions:
- A -> 4
- B -> 8
- E -> 3
- G -> 9
- H -> |-|
- I -> 1
- L -> |
- O -> 0
- S -> 5
- T -> 7
- U -> |_|
These substitutions are often used to replace corresponding letters in words, creating a stylized and playful form of text.
Purpose and Usage
LeetSpeak is primarily used for fun, as a way to obfuscate text, or as a form of identity within certain online communities. It has gained popularity in gaming, programming, and internet subcultures.
The multilingual LeetSpeak encoder/decoder project presented here allows users to apply and reverse LeetSpeak transformations in various languages, adding an extra layer of creativity and customization to text manipulation.
Multilingual Character Replacement Module
multilang.py
Overview
The multilang.py
file serves as a module providing character replacement dictionaries for various languages, supporting text encoding and decoding. It includes language-specific dictionaries and language dictionary modules for decoding. Let’s break down the code step by step.
Language Data Dictionary
language_data = { 'ar': ( {'ا': '4', 'ب': '8', 'ت': '7', 'ث': '6', 'ج': '9', 'ح': '|-|', 'خ': 'x', 'د': '|)', 'ذ': '0', 'ر': '®', 'ز': '2', 'س': '5', 'ش': '$', 'ص': '|_', 'ض': '|_', 'ط': '7', 'ظ': 'z', 'ع': '3', 'غ': '9', 'ف': '|*', 'ق': 'q', 'ك': '|<', 'ل': '|', 'م': '|v|', 'ن': '|\\|', 'ه': '|-|', 'و': '0', 'ي': '1', 'ؤ': '|_|', 'ة': 'h', 'و': 'w', 'ج': 'j', 'د': 'd', 'ت': 't', 'ك': 'k', 'ل': 'l', 'أ': 'a', 'ر': 'r', 'م': 'm', 'ي': 'y', 'س': 's', 'ح': 'h', 'ف': 'f', 'ن': 'n', 'ء': '`', 'ق': 'q', 'ط': 't', 'ع': 'e', 'ه': 'h', 'ئ': '}', 'و': 'o', 'ج': 'j', 'د': 'd', 'ة': 'a', 'و': 'w', 'ج': 'j', 'د': 'd', 'ت': 't', 'ك': 'k', 'ل': 'l', 'أ': 'a', 'ر': 'r', 'م': 'm', 'ي': 'y', 'س': 's', 'ح': 'h', 'ف': 'f', 'ن': 'n', 'ء': '`', 'ق': 'q', 'ط': 't', 'ع': 'e', 'ه': 'h', 'ئ': '}', 'ـ': ''}, None), 'da': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'de': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, 'nltk'), 'el': ( {'Α': '4', 'Β': '8', 'Λ': '|', 'Ε': '3', 'Γ': '9', 'Η': '|-|', 'Ι': '1', 'Ο': '0', 'Σ': '5', 'Τ': '7', 'Υ': '|_|'}, None), 'en': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, 'nltk'), 'es': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, 'nltk'), 'fi': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'fr': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, 'nltk'), 'fr_ca': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, 'nltk'), 'he': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'hi': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'haw': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'id': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'iu': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'it': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, 'nltk'), 'ja': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, 'nltk'), 'ko': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'ms': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'nl': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, 'nltk'), 'no': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'pl': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, 'nltk'), 'pt_br': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, 'nltk'), 'ro': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'ru': ( {'А': '4', 'Б': '6', 'В': 'B', 'Г': 'r', 'Д': 'g', 'Е': 'e', 'Є': '3', 'Ж': 'ж', 'З': '3', 'И': 'u', 'І': 'i', 'Ї': 'i', 'Й': 'й', 'К': 'k', 'Л': 'l', 'М': 'M', 'Н': 'H', 'О': '0', 'П': 'n', 'Р': 'p', 'С': 'c', 'Т': 'T', 'У': 'y', 'Ф': 'ф', 'Х': 'x', 'Ц': 'u', 'Ч': '4', 'Ш': 'ш', 'Щ': 'щ', 'Ь': 'b', 'Ю': '10', 'Я': '9'}, 'nltk'), 'sv': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'th': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'tr': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'uk': ( {'А': '4', 'Б': '6', 'В': 'B', 'Г': 'r', 'Д': 'g', 'Е': 'e', 'Є': '3', 'Ж': 'ж', 'З': '3', 'И': 'u', 'І': 'i', 'Ї': 'i', 'Й': 'й', 'К': 'k', 'Л': 'l', 'М': 'M', 'Н': 'H', 'О': '0', 'П': 'n', 'Р': 'p', 'С': 'c', 'Т': 'T', 'У': 'y', 'Ф': 'ф', 'Х': 'x', 'Ц': 'u', 'Ч': '4', 'Ш': 'ш', 'Щ': 'щ', 'Ь': 'b', 'Ю': '10', 'Я': '9'}, 'nltk'), 'vi': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, None), 'zh': ( {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'}, 'nltk'), }
The language_data
dictionary contains language identifiers as keys, each associated with a tuple. The tuple consists of a character replacement dictionary and a language dictionary module indicator (either None, ‘nltk’ or another dictionary module name) for decoding. If None is specified in place of a module name, the NLTK module will be used by default. At the moment the language dictionary module is not used. But will be used in a near-future version to aid in decoding encoded text.
get_character_replacement_dict
Function
def get_character_replacement_dict(lang_id: str) -> dict: """ Get the character replacement dictionary for a given language. Args: lang_id (str): Language identifier. Returns: dict: Character replacement dictionary for the specified language. Raises: ValueError: If the language ID is not supported. """ if lang_id in language_data: char_replacement_dict, _ = language_data[lang_id] return char_replacement_dict else: raise ValueError(f"Unsupported language ID: '{lang_id}'.")
This function retrieves the character replacement dictionary for a given language identifier. If the language is not supported, it raises a ValueError
.
create_reverse_dict
Function
def create_reverse_dict(original_dict): return {v: k for k, v in original_dict.items()}
This function creates a reverse dictionary, mapping values to keys, which is useful for decoding LeetSpeak.
get_language_dictionary_module
Function
def get_language_dictionary_module(lang_id: str) -> str: """ Get the module to import for language dictionary based on the language identifier. Args: lang_id (str): Language identifier. Returns: str: Module to import for language dictionary. Raises: ValueError: If the language ID is not supported. """ if lang_id in language_data: _, module_to_import = language_data[lang_id] if module_to_import == 'custom': print(f"Warning: The language '{lang_id}' requires a custom language dictionary for decoding.") print("Defaulting to language dictionary NLTK for decoding.") module_to_import = 'nltk' else: raise ValueError(f"Unsupported language ID: '{lang_id}'.") return module_to_import
This function retrieves the module to import for language dictionary based on the language identifier. It also provides a default option (NLTK) if no module is specified as is designated by an entry of None.
decode_leet
Function
def decode_leet(cypher_text: str, lang_id: str = 'en'): """ Decode leet speak text into its original alphabetical form. Args: cypher_text (str): Leet speak text to decode. lang_id (str): ISO language indicator. Returns: str: Decoded text. """ dictionary = get_character_replacement_dict(lang_id) reverse_lookup_dict = create_reverse_dict(dictionary) decoded_text = '' i = 0 while i < len(cypher_text): found_multi_char = False for multi_char in sorted(reverse_lookup_dict.keys(), key=len, reverse=True): if cypher_text[i:i + len(multi_char)].lower() == multi_char: decoded_text += reverse_lookup_dict[multi_char].lower() i += len(multi_char) found_multi_char = True break if not found_multi_char: char = cypher_text[i].lower() if char in reverse_lookup_dict: decoded_text += reverse_lookup_dict[char].lower() else: decoded_text += char i += 1 decoded_text = decoded_text.capitalize() return decoded_text
This function decodes LeetSpeak text using the specified language’s character replacement dictionary.
encode_leet
Function
def encode_leet(input_text: str, lang_id: str='en'): """ Encode the input text using Leet (1337) speak. Parameters: - input_text (str): The input text to be encoded. - lang_id (str): The language ID. Returns: - output_text (str): The Leet encoded text. """ dictionary = get_character_replacement_dict(lang_id) output_text = "" for char in input_text: if char.upper() in dictionary: output_text += dictionary[char.upper()] else: output_text += char return output_text
This function encodes the input text into LeetSpeak using the specified language’s character replacement dictionary.
Example Usage
The file includes an example of using the module in a main guard, demonstrating how to obtain character replacement dictionaries and language modules.
Leet Speak Encoder/Decoder Script
leetspeak.py
Overview
The leetspeak.py
script serves as the main application for LeetSpeak encoding and decoding. It utilizes the multilang
module for character replacement dictionaries and functions.
display_info
Function
def display_info(text): """ Display information about the text, including length in words and characters. Args: text (str): Text to analyze. """ word_count = len(text.split()) char_count = len(text) print(f'Text Length: {char_count} characters, {word_count} words.')
This function displays information about the text, including word count and character count.
Main Function
def main(): """ Main function to handle command-line arguments and execute leet speak encoding/decoding. """ parser = argparse.ArgumentParser(description='Leet Speak Encoder/Decoder') parser.add_argument('-e', '--encode', action='store_true', help='Encode clear text into leet speak') parser.add_argument('-d', '--decode', action='store_true', help='Decode leet speak text into clear text') parser.add_argument('-lang', '--language', default='en', help='ISO language indicator (default: en)') parser.add_argument('-m', '--message', help='Text string to encode or decode') parser.add_argument('-i', '--input', help='File name to read encoded/decoded text') parser.add_argument('-o', '--output', help='File name to save encoded/decoded text') parser.add_argument('-v', '--verbose', action='store_true', help='Display additional information about the text') args = parser.parse_args() language_id = args.language.lower() if args.message: input_text = args.message elif args.input: with open(args.input, 'r', encoding='utf-8') as file: input_text = file.read() else: raise ValueError("Please provide either a text string (-m) or an input file (-i).") if args.encode: output_text = encode_leet(input_text, language_id) print(f'Encoded Leet Speak Text: {output_text}') if args.verbose: display_info(output_text) if args.output: with open(args.output, 'w', encoding='utf-8') as file: file.write(output_text) else: print(f"Encoded Text: {output_text}") elif args.decode: decoded_text = decode_leet(input_text, language_id) if args.verbose: display_info(decoded_text) if args.output: with open(args.output, 'w', encoding='utf-8') as file: file.write(decoded_text) else: print(f"Decoded Text: {decoded_text}") if __name__ == '__main__': main()
This script uses the argparse
module to handle command-line arguments. It allows encoding or decoding LeetSpeak based on user input. The language, input text, and output options are customizable.
Conclusion
The multilingual LeetSpeak encoder/decoder application is a comprehensive solution for encoding and decoding text using LeetSpeak conventions. The modular design allows for easy addition of language support, and the script provides a user-friendly interface through command-line arguments.
The development process involved creating a flexible character replacement module, incorporating support for multiple languages, and designing a user-friendly script for encoding and decoding LeetSpeak. The final result is a robust and extensible application for language enthusiasts and those interested in text manipulation.
Resources
Repository Link
The complete source code for the multilingual LeetSpeak encoder/decoder project can be found on the GitHub repository:
Feel free to explore the code, contribute, or open issues if you have suggestions or encounter any problems.
Additional Reading
- LeetSpeak – Wikipedia
- LeetSpeak Translator: An online tool to convert text to LeetSpeak and vice versa.
- The Jargon File – Leet: Historical perspective on LeetSpeak from The Jargon File.