Anix Lynchanixblog.hashnode.dev·Oct 4, 202420 Huggingface Tokenizers concepts with Examples1. Installing Hugging Face Tokenizers 📦 Boilerplate Code: pip install tokenizers Use Case: Install the Hugging Face Tokenizers library to tokenize text data efficiently. Goal: Set up the tokenizers library to quickly tokenize and process large text...Discusstokenizer
Edward Obohedwardoboh.hashnode.dev·Jun 29, 2024Comparing fnmatch and regexPattern matching is a fundamental aspect of text processing, enabling powerful searches and manipulations in various applications. Two common methods for pattern matching are fnmatch and regex. Each has its strengths and limitations, and understandin...Discussfnmatch
David Osharedavidoshare.hashnode.dev·Apr 30, 2024Regular Expressions for Text Processing: Mastering Patterns and ManipulationIn the realm of text processing, regular expressions, often abbreviated as regex, stand as a powerful tool for manipulating and extracting specific patterns from textual data. This article delves into the world of regex, equipping intermediate Python...DiscussPython
Uffa Modeyfafa.codes·Feb 22, 202410 Python Techniques for Text Manipulation1. Processing a string one character at a time To process a string one character at a time, you can use the map built-in function in Python to make every string character be processed using a predefined function. For example, if we have a function ca...Discuss·10 likes·47 readsmanipulating text
Mohamad Mahmoodtextprocessing.hashnode.dev·Feb 9, 2024What is Part-of-Speech (POS) Tagging?Part-of-speech (POS) tagging, also known as grammatical tagging or word category disambiguation, is the process of assigning a grammatical category or part-of-speech label to each word in a sentence or text. The goal of POS tagging is to determine th...Discusstext processing
Mohamad Mahmoodtextprocessing.hashnode.dev·Feb 9, 2024What is Stemming and Lemmatization?Stemming and lemmatization are techniques used in natural language processing (NLP) to reduce words to their base or root forms, thereby normalizing the text. Both stemming and lemmatization aim to handle variations of words and improve text analysis...Discusstext processing
Mohamad Mahmoodtextprocessing.hashnode.dev·Feb 9, 2024What is Stop Word Removal?Stop word removal is a text preprocessing technique that involves eliminating common words, known as stop words, from a piece of text. Stop words are words that frequently occur in a language but typically do not carry significant meaning or contribu...Discusstext processing
Mohamad Mahmoodtextprocessing.hashnode.dev·Feb 9, 2024What is Text Tokenization?Text tokenization, also known as word tokenization, is the process of breaking down a text into smaller units called tokens. Tokens are typically words, but they can also be phrases, sentences, or even individual characters, depending on the granular...Discusstext processing
Mohamad Mahmoodtextprocessing.hashnode.dev·Feb 9, 2024What is text processing?Text processing refers to the manipulation and analysis of textual data using various computational techniques. It involves performing operations on text to extract useful information, transform its structure, or derive insights from it. Text process...Discusstext processing
K Ahameddatailm.hashnode.dev·Sep 25, 2023Text Preprocessing for Effective Natural Language Processing (NLP) TasksWhen working with text data for Natural Language Processing (NLP) tasks like sentiment analysis or text classification, it's essential to perform text preprocessing. It involves cleaning and transforming raw text data into a format that can be effect...Discuss·34 readsNatural Language Processingtext processing