POPrecious Obinnainprecious-o.hashnode.dev·Jun 12 · 8 min readHow I Used Context Engineering to Improve RetrievalIn my previous article, I talked about how I cached users' intentions instead of their exact queries to reduce token usage and latency drastically. Context engineering played a HUGE role in doing that00
POPrecious Obinnainprecious-o.hashnode.dev·Jun 3 · 3 min readHow I cached intention, not queriesWhile building a project for a client recently, the system runs an LLM pipeline with multiple LLM calls. I ran into 2 obvious problems — latency and high token usage. I needed a way to kill both birds20
POPrecious Obinnainprecious-o.hashnode.dev·May 15 · 3 min readThreads vs Processes in Task Queues: The Reliability TradeoffWhen I started building Tarsq, the execution model was thread-based. At first, threads felt like the obvious choice: Lightweight Easy shared memory Simple ctx injection Lower overhead Straightfor00
POPrecious Obinnainprecious-o.hashnode.dev·Feb 10 · 4 min readThe Reflection Pattern in AI: Teaching Models to Think About Their ThinkingI was building an AI system, and I kept running into the same frustrating problem: the AI would generate outputs that looked great at first glance, but had very subtle issues. wrong calculations, unrealistic assumptions, logic that made no sense. So ...00
POPrecious Obinnainprecious-o.hashnode.dev·Dec 31, 2025 · 6 min readRetrieval Is Not Resolution: Building a Hallucination-Resistant RAG System with LLMs and SQL ServerThe problem (it looked simple until I tried it) I started building what I thought was a straightforward system: user types a gadget name → system returns the price That’s it. Nothing fancy. Then reality hit. Users don’t type “apple iphone 12 pro 12...00