Bharatwaj Cbharatwaj.hashnode.dev·Dec 15, 2024Linux Command - Text Processingcat - Concatenate Files and Print on Standard Output Using cat as a primitive word processor. You can enter the below command, type your text, press ENTER to finish the line, and then press CTRL-D to indicate the end-of-file. bharatwaj@comp:~$ c...Linux
Anix Lynchgozeroshot.dev·Oct 4, 202420 Huggingface Tokenizers concepts with Examples1. Installing Hugging Face Tokenizers 📦 Boilerplate Code: pip install tokenizers Use Case: Install the Hugging Face Tokenizers library to tokenize text data efficiently. Goal: Set up the tokenizers library to quickly tokenize and process large text...tokenizer
Edward Obohedwardoboh.hashnode.dev·Jun 29, 2024Comparing fnmatch and regexPattern matching is a fundamental aspect of text processing, enabling powerful searches and manipulations in various applications. Two common methods for pattern matching are fnmatch and regex. Each has its strengths and limitations, and understandin...fnmatch
David Osharedavidoshare.hashnode.dev·Apr 30, 2024Regular Expressions for Text Processing: Mastering Patterns and ManipulationIn the realm of text processing, regular expressions, often abbreviated as regex, stand as a powerful tool for manipulating and extracting specific patterns from textual data. This article delves into the world of regex, equipping intermediate Python...Python
Uffa Modeyfafa.codes·Feb 22, 202410 Python Techniques for Text Manipulation1. Processing a string one character at a time To process a string one character at a time, you can use the map built-in function in Python to make every string character be processed using a predefined function. For example, if we have a function ca...10 likes·47 readsmanipulating text
Mohamad Mahmoodtextprocessing.hashnode.dev·Feb 9, 2024What is Part-of-Speech (POS) Tagging?Part-of-speech (POS) tagging, also known as grammatical tagging or word category disambiguation, is the process of assigning a grammatical category or part-of-speech label to each word in a sentence or text. The goal of POS tagging is to determine th...text processing
Mohamad Mahmoodtextprocessing.hashnode.dev·Feb 9, 2024What is Stemming and Lemmatization?Stemming and lemmatization are techniques used in natural language processing (NLP) to reduce words to their base or root forms, thereby normalizing the text. Both stemming and lemmatization aim to handle variations of words and improve text analysis...text processing
Mohamad Mahmoodtextprocessing.hashnode.dev·Feb 9, 2024What is Stop Word Removal?Stop word removal is a text preprocessing technique that involves eliminating common words, known as stop words, from a piece of text. Stop words are words that frequently occur in a language but typically do not carry significant meaning or contribu...text processing
Mohamad Mahmoodtextprocessing.hashnode.dev·Feb 9, 2024What is Text Tokenization?Text tokenization, also known as word tokenization, is the process of breaking down a text into smaller units called tokens. Tokens are typically words, but they can also be phrases, sentences, or even individual characters, depending on the granular...text processing
Mohamad Mahmoodtextprocessing.hashnode.dev·Feb 9, 2024What is text processing?Text processing refers to the manipulation and analysis of textual data using various computational techniques. It involves performing operations on text to extract useful information, transform its structure, or derive insights from it. Text process...text processing