Kiran ReddyforDatabricks - PySparkdatabricks-pyspark-blogs.hashnode.dev·May 26, 2024Understanding Spark Memory Architecture: Best Practices and TipsSpark is an in-memory processing engine where all of the computation that a task does happens in memory. So, it is important to understand Spark Memory Management. This will help us develop Spark applications and perform performance tuning. In Apache...Discuss·10 likesspark optimizations
yash bhaskaryash9439.hashnode.dev·Mar 13, 2024Accelerating Document Embedding Generation with Ray, FastEmbed, and QdrantFor medium and large businesses, extracting meaningful insights from large volumes of unstructured data, such as text documents, is crucial. However, the traditional approach of sequentially processing documents for embedding generation can take time...Discuss·10 likesDocument Embedding
Syed Sarfaraz Ahammedsyedsarfarazahammed.hashnode.dev·May 23, 2023Apache Beam IntroductionApache Beam (Batch + strEAM) is an open-source, unified programming model for processing and analyzing large-scale data sets. It provides a simple and expressive way to implement data processing pipelines that can run on various distributed processin...Discuss·1 likeApache Beamdata processing