Measuring the effectiveness of these agents #94
johnbillion
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'd love to see some supporting evidence in this repo that documents the effectiveness of each of the agents compared to using a vanilla sub-agent for the same task.
Discussion #42 raised the concern that that the prompts are very generic. I use Claude Code daily and recently started using vanilla sub-agents for tasks that don't need all the context of the main workflow, and I've found it to be effective based on my prompt alone. My gut feeling is that the light-touch agents in this repo won't materially make much difference to the effectiveness of each sub-agent, but I would love to be proved wrong.
In #42 you requested quantifiable metrics to back up a claim that the sub-agents aren't effective. To flip this on its head, are there quantifiable metrics that can be used to demonstrate that the sub-agents in this repo are more effective than a vanilla prompt? I presume they must be more effective otherwise there would be no need for them to exist, but there's no mention of this in the repo description.
How can the effectiveness of the sub-agents in this repo be measured, documented, and tracked over time in order to demonstrate that they are actually more effective than a vanilla sub-agent using the same prompt?
Beta Was this translation helpful? Give feedback.
All reactions