MKMark Kiminm-kim.hashnode.dev·Jun 1 · 5 min readRunning an LLM at homeHow I learned to stop worrying and love LLMs About a month ago, I found myself with a weekend to myself and I thought "Hmmm, it's been a while since I tried to run an LLM at home." It's been so long t00
MKMark Kiminm-kim.hashnode.dev·Jan 10, 2024 · 1 min readatomics, AMD Radeon and SYCL (Part 4)Looking closely at the atomics used in the backwards rasterizer, it appears that the computation only occurs across a block. The atomics could be replaced with a reduction! So, I tried writing it in SYCL and there's warp stalling and it crashes the c...00
MKMark Kiminm-kim.hashnode.dev·Dec 28, 2023 · 1 min readatomics, AMD Radeon and SYCL (part 3)Well, I tried to use AdaptiveCpp, but there's some weirdness about memcpy and memset going on. And, it looks like I'm not the only one to notice that atomics are terrible. Here's something that just came up five days ago that's exactly what I'm strug...00
MKMark Kiminm-kim.hashnode.dev·Dec 22, 2023 · 1 min readatomics, AMD Radeon and SYCL (Part 2)I couldn't get AdaptiveCpp to compile the project. It segfaulted. Here are some resources for inlining assembly into SYCL for HIP/ROCm. Atomic performance issues in AdaptiveCpp. Dumping (or at least trying to) IR from icpx. HIP Clang inline assemb...00
MKMark Kiminm-kim.hashnode.dev·Dec 21, 2023 · 1 min readatomics, AMD Radeon and SYCLPerformance is atrocious for atomics on the 6600XT. Specifically, with SYCL. There's a thread from the GROMACS developers on why using AdaptiveCPP. And a thread from the IntelLLVM team. How bad is it for Gaussian Splatting? 100 iterations of ~30k par...00