Three things that actually tell me something's wrong:
Trigger drift — the skill stops firing on inputs it used to handle
Zero delta — same result with or without the skill active
My correction log — every override I make in a real session gets saved as a test case
The hardest part? Claude's baseline keeps improving. A skill that was genuinely useful six months ago might be dead weight today and nothing tells you.
Been building something to tackle this systematically → https://github.com/Evol-ai/SkillCompass
But curious what others are doing — How do you know when a Claude Code skill is actually working? How are you improving it?
No responses yet.