Tag feed

#on-device-ai

27 posts1 followers

Trending tags this week

Adaptive KV-Cache Quantization: How 'Don't Waste Bits' Cuts On-Device LLM Latency by 17%

May 9 · 7 min read · Running LLMs on-device means fighting two constraints simultaneously: memory and latency. The KV-cache — the buffer that stores past token representations so the model does not recompute them — is often the bottleneck on both fronts. A paper publishe...

Join discussion

FFetchLogicfetchlogic.hashnode.dev

0

Chrome's 4GB Silent Passenger: What the Mechanism Actually Reveals

May 7 · 8 min read · Four gigabytes arrived on your hard drive without a knock. Not malware — a language model. Specifically, a file named weights.bin, sitting inside a directory called OptGuideOnDeviceModel, placed there by the browser most of the world uses to read its...

Join discussion

AKAnup Karanjkarwowhow.hashnode.dev

0

Google Gemma 4: Apache 2.0 Open Models That Run on Your Laptop (2026)

May 2 · 10 min read · Google DeepMind released Gemma 4 on April 2, 2026, and two things make it immediately significant: Apache 2.0 licensing and hardware efficiency that no competitor at this capability level matches. Gemma 4’s 31B Dense model ranks third globally among ...

Join discussion

AKAnup Karanjkarwowhow.hashnode.dev

0

ASUS UGen300: Run AI From a USB Stick — Edge Inference in 2026

May 2 · 10 min read · On April 1, 2026, ASUS announced the UGen300 — a USB stick-sized AI accelerator that plugs into any USB-C port and delivers 40 AI TOPS of dedicated inference compute at just 2.5 watts. Powered by Hailo’s Hailo-10H processor and backed by 8GB of LPDDR...

Join discussion

JKJangwook Kimeffloow.hashnode.dev

0

On-Device AI 2026: Developer Guide to NPUs and Edge Inference

Apr 26 · 13 min read · ## Why On-Device AI Matters More Than Ever in 2026 Two years ago, "on-device AI" meant running a tiny sentiment classifier or waking up a voice assistant. In 2026, it means running 70-billion-parameter language models portably on a laptop, processin...

Join discussion

LDLightning Developertech-odyssey.hashnode.dev

0

Making Sense of Local AI: TurboQuant and Gemma 4 Explained

Apr 14 · 12 min read · TurboQuant for Efficient LLMs and How Gemma 4 Utilizes ItTurboQuant Gemma 4 efficient LLMs on-device AI edge AI KV cache AI compression Two Announcements, One Direction On March 24, 2026, Google Resea

Join discussion

JYJimoh Yusufjimohyusuph.hashnode.dev

0

Building Smarter Apple Apps with Foundation Models - Part 2

Apr 12 · 5 min read · In the previous article, we covered the basics of Foundation Models. let’s build on that and apply them to a real-world use case: processing CV data in a recruitment application. The Problem Imagine y

Join discussion

AMAbhijit Mandalqubridai.hashnode.dev

0

Google Gemma 4 Technical Deep Dive: Architecture, MoE, Benchmarks & Production Guide

Apr 2 · 15 min read · Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release

Join discussion

FEFazm Engineeringfazm.hashnode.dev

0

On-Device AI on Apple Silicon - What It Means for Desktop Agents

Mar 18 · 3 min read · Apple Silicon changed what is possible for local AI. The unified memory architecture means ML models can run on the GPU without copying data between CPU and GPU memory. For a desktop agent that needs to process screen content in real-time, this matte...

Join discussion

JYJimoh Yusufjimohyusuph.hashnode.dev

0

Building Smarter Apple Apps with Foundation Models

Mar 15 · 10 min read · Artificial intelligence is quickly becoming a core part of modern apps, but integrating it well is still a challenge. Most AI-powered features rely heavily on cloud APIs, which can introduce latency,

Join discussion

#on-device-ai

Search Hashnode

#on-device-ai

Trending tags this week

Adaptive KV-Cache Quantization: How 'Don't Waste Bits' Cuts On-Device LLM Latency by 17%

Chrome's 4GB Silent Passenger: What the Mechanism Actually Reveals

Google Gemma 4: Apache 2.0 Open Models That Run on Your Laptop (2026)

ASUS UGen300: Run AI From a USB Stick — Edge Inference in 2026

On-Device AI 2026: Developer Guide to NPUs and Edge Inference

Making Sense of Local AI: TurboQuant and Gemma 4 Explained

Building Smarter Apple Apps with Foundation Models - Part 2

Google Gemma 4 Technical Deep Dive: Architecture, MoE, Benchmarks & Production Guide

On-Device AI on Apple Silicon - What It Means for Desktop Agents

Building Smarter Apple Apps with Foundation Models