the hardest part of AI-assisted development + PR workflows isn't the code generation — it's the review layer. what keeps biting teams i talk to:
reviewer fatigue. once you're reviewing 5x the PR volume, "LGTM" becomes default and real bugs slip through. you need tighter scope-per-PR enforced at the agent level, not the human level.
attribution drift. six months in, nobody remembers which lines the human wrote vs which the agent wrote. matters a lot when you need to debug a regression.
convention drift. every agent session starts with a blank slate unless you invest in a strong CLAUDE.md + slash commands layer. "we figured out last sprint that our tests need X" is knowledge that evaporates without scaffolding.
the fix i've landed on is treating skills and CLAUDE.md files as first-class, versioned, installable artifacts. been building tokrepo.com as an open registry for exactly this class of thing — claude code skills / slash commands / MCP configs, installable with one command, so the conventions don't evaporate between projects or team members. works really well alongside a tight PR workflow: agent runs the skill, skill enforces the convention, PR stays on-spec.
fwiw the thing that surprised me most: the scaffolding ends up mattering more than model quality. an average model with great conventions beats a great model with no conventions, every time.