@Tebza

Tebogo Tseka

@Tebza

Building with AWSomeness

Joined June 2023

About

Nothing here yet.

Available for

Nothing here yet.

Tebogo Tseka's blogs

tebogosacloudtebza.hashnode.dev5 posts

Articles Threads Comments

Recently published

TTTebogo Tsekatebza.hashnode.dev

How I Run Over 20 AI Agents Locally and Deploy One to Production at a Time

Apr 13 · 10 min read · This article was originally published on tebogosacloud.blog. I have over 20 AI agents. Only one is in production. That is not a constraint. It is a strategy. A system with one excellent production ag

Join discussion

TTTebogo Tsekatebza.hashnode.dev

The Missing Test Suite: Why AI Projects Fail Before Production

Apr 2 · 18 min read · Most AI projects never ship. The gap isn't the model — it's the lack of testability. The Uncomfortable Truth Gartner predicted that through 2022, 85% of AI projects would deliver erroneous outcomes d

Join discussion

TTTebogo Tsekatebza.hashnode.dev

Building an LLM Judge That Doesn't Lie to You

Mar 31 · 9 min read · Our first LLM judge gave a 9/10 to a page where the hero text was completely invisible. Dark grey text on a dark background image. The CSS was syntactically valid. The HTML was well-structured. Every tag was correct. The page was unusable. And our ju...

Join discussion

TTTebogo Tsekatebza.hashnode.dev

5 Models, 467 Actions, 1 Winner — What We Learned Comparing LLMs on Real Code Generation

Mar 30 · 10 min read · We tested five AI models on the same task 467 times. Each run produced a complete deployable website — not a code snippet, not a function, not a patch. A real site with HTML, CSS, JavaScript, and assets. The question: can cheaper models match Claude ...

MEAamer and 1 more commented

TTTebogo Tsekatebza.hashnode.dev

Beyond Text: How We Built an Evaluation Framework for Multi-File AI Outputs

Mar 30 · 9 min read · Most LLM benchmarks evaluate text. HumanEval checks if a function passes unit tests. SWE-bench measures whether a model can patch a repository. MBPP scores single-function completions. None of these work when your AI agent generates an entire website...

Join discussion

Tebogo Tseka

About

Available for

Tebogo Tseka's blogs

Recently published

How I Run Over 20 AI Agents Locally and Deploy One to Production at a Time

The Missing Test Suite: Why AI Projects Fail Before Production

Building an LLM Judge That Doesn't Lie to You

5 Models, 467 Actions, 1 Winner — What We Learned Comparing LLMs on Real Code Generation

Beyond Text: How We Built an Evaluation Framework for Multi-File AI Outputs

Search Hashnode

Tebogo Tseka

About

Available for

Tebogo Tseka's blogs

Recently published

How I Run Over 20 AI Agents Locally and Deploy One to Production at a Time

The Missing Test Suite: Why AI Projects Fail Before Production

Building an LLM Judge That Doesn't Lie to You

5 Models, 467 Actions, 1 Winner — What We Learned Comparing LLMs on Real Code Generation

Beyond Text: How We Built an Evaluation Framework for Multi-File AI Outputs