Shaun Liew (@shaunliew20)

SLShaun Liewshaunliew.hashnode.devMay 18 · 15 min read

Qwen3 Speculative Decoding on the DGX Spark: Two Models, Four Methods, One Surprising Lesson

The Problem: One Token at a Time When you ask a large language model a question, it does not write the whole answer in a single step. It generates one small piece of text at a time. That piece is call

0

SLShaun Liewshaunliew.hashnode.devMay 16 · 14 min read

Running Qwen3-VL on DGX Spark: Transformers vs vLLM vs SGLang

I've been running Qwen3-VL locally for a while now, mostly with the standard from_pretrained() setup. It works, but it's slow. So, I kept wondering whether switching to vLLM or SGLang would actually m

0

SLShaun Liewshaunliew.hashnode.devDec 7, 2025 · 4 min read

Fixing CUDA PTX Error When Running Qwen3-VL with vLLM on H200

Running vision-language models like Qwen3-VL with vLLM on high-end GPUs should be straightforward. Except when it's not. The Problem I was setting up Qwen3-VL-8B-Instruct on our H200 cluster (8x H200, 143GB VRAM each) when I hit this error: vllm ser...

0

SLShaun Liewshaunliew.hashnode.devJul 29, 2025 · 9 min read

Understanding neural style transfer

Neural Style Transfer: Turning the Mona Lisa into a Picasso Have you ever wondered what the Mona Lisa would look like if Picasso had painted it? Neural style transfer makes this artistic dream a reality by combining the content of one image with the ...

0

SLShaun Liewshaunliew.hashnode.devJul 29, 2025 · 10 min read

Performing an adversarial attack on images

Exploring adversarial attacks - where tiny, imperceptible modifications can completely deceive even the smartest neural networks From Autoencoders to Adversarial Attacks: A New Kind of Magic After my fascinating journey with Variational Autoencoders,...

0

Shaun Liew

About

Available for

Shaun Liew's blogs

Recently published

Qwen3 Speculative Decoding on the DGX Spark: Two Models, Four Methods, One Surprising Lesson

Running Qwen3-VL on DGX Spark: Transformers vs vLLM vs SGLang

Fixing CUDA PTX Error When Running Qwen3-VL with vLLM on H200

Understanding neural style transfer

Performing an adversarial attack on images

Shaun Liew

About

Available for

Shaun Liew's blogs

Recently published

Qwen3 Speculative Decoding on the DGX Spark: Two Models, Four Methods, One Surprising Lesson

Running Qwen3-VL on DGX Spark: Transformers vs vLLM vs SGLang

Fixing CUDA PTX Error When Running Qwen3-VL with vLLM on H200

Understanding neural style transfer

Performing an adversarial attack on images

Shaun Liew

About

Available for

Shaun Liew's blogs

Recently published

Qwen3 Speculative Decoding on the DGX Spark: Two Models, Four Methods, One Surprising Lesson

Running Qwen3-VL on DGX Spark: Transformers vs vLLM vs SGLang

Fixing CUDA PTX Error When Running Qwen3-VL with vLLM on H200

Understanding neural style transfer

Performing an adversarial attack on images

Search Hashnode

Shaun Liew

About

Available for

Shaun Liew's blogs

Recently published

Qwen3 Speculative Decoding on the DGX Spark: Two Models, Four Methods, One Surprising Lesson

Running Qwen3-VL on DGX Spark: Transformers vs vLLM vs SGLang

Fixing CUDA PTX Error When Running Qwen3-VL with vLLM on H200

Understanding neural style transfer

Performing an adversarial attack on images