Mar 9 · 10 min read · Github Repo : gpu-parallel-patterns Colab : Colab Benchmark Histogram GPU/Env : Tesla T4 / Driver 580.82.07 / CUDA 12.8 How to reproduce : scripts/bootstrap_colab.sh→ scripts/tests.sh → scripts/bench_
Join discussion
Mar 6 · 4 min read · If you're building infrastructure for Artificial Intelligence (AI), Machine Learning (ML), or High-Performance Computing (HPC), powerful hardware alone is not enough. The real performance advantage co
Join discussion
Mar 4 · 3 min read · If you've been trying to run standard vLLM Docker images on the new NVIDIA DGX Spark, you’ve probably hit a wall. Between the ARM64 (Grace) architecture, the Blackwell (GB10) GPU, and the requirements
Join discussion
Feb 7 · 7 min read · The digital gold rush of cryptocurrency mining has seen its peaks and valleys. If you’ve been in the crypto space, you’ve likely witnessed the enormous investments poured into powerful GPU farms, all dedicated to solving complex cryptographic puzzles...
Join discussion
Dec 9, 2025 · 7 min read · $$z_n = z_{n-1}^2 + c$$This one simple equation is responsible for rendering some very interesting visuals. What is this equation really? One way to think about the equation is as a complex number generator. You pick an initial seed point c and then ...
Join discussion
Dec 3, 2025 · 18 min read · If you’re working on complex Machine Learning projects, you’ll need a good Graphics Processing Unit (or GPU) to power everything. And Nvidia is a popular option these days, as it has great compatibility and widespread support. If you’re new to Machin...
Join discussion
Nov 27, 2025 · 22 min read · Every time you ask ChatGPT a question, get a movie recommendation on Netflix, or watch your phone recognize faces in photos, billions of matrix multiplications are happening behind the scenes. This fundamental mathematical operation has become the co...
Join discussionNov 27, 2025 · 43 min read · In our previous blog, we explored GPU computing fundamentals: memory hierarchies, thread organization, warps, memory coalescing, and kernel classification (memory-bound vs. compute-bound). In this blog, we apply these concepts to optimize GEMM (Gener...
Join discussion