May 24 · 8 min read · AI compilers are evolving fast. In many cases, torch.compile plus JIT optimization in PyTorch can deliver striking speedups, to the point that people often say, "hand-written operators are no longer n
Join discussionApr 9 · 33 min read · Note: Text translated by AI. Code crafted by human. Matrix transpose is one of the most fundamental operations in deep learning and high-performance computing. The deceptively simple coordinate swap
Join discussionApr 6 · 4 min read · The Myth: "C# is too slow for AI" For years, the narrative has been the same: if you want high-performance AI, you must use C++ or Python wrappers (like PyTorch/ONNX) that call into native kernels. Th
Join discussionMar 26 · 5 min read · Hi I am Duc Dao!Welcome to my blog : D Today, we’ll take a quick flash tour through High Performance Computing (HPC) and GPU programming. Sometimes, the most interesting things you learn don’t come fr
Join discussion
Jan 2 · 6 min read · C++ is renowned for its “zero-cost abstraction” philosophy, granting programmers near-direct control over hardware. However, this powerful control comes with significant responsibility — chief among them is resource management. Here, “resources” exte...
Join discussion