Replies: 7 comments
-
|
Hi @cahaseler, Thanks for reaching out! That sounds frustrating and not typical of any experience on my end. Are you using the latest version of tdd-guard? Do you have a custom instructions.md file in .claude/tdd-guard/data? Try updating, to the latest version of tdd-guard and delete the instructions.md file if you have one. This is to ensure that you get the latest version. We'll be soon releasing version 1.0.0 which provides a new sdk client that makes it possible to configure the model used for validation. In the meantime, can you help me better understand your situation? Which language and reporter are you using? Can you see the test run results in .claude/tdd-guard/data/test.json? Also, do you mind sharing an example of the test/code it is struggling with to see if I run into similar issues replicating it? Thanks again for reaching out! |
Beta Was this translation helpful? Give feedback.
-
|
Could the agent be running a single test at a time instead of the full test file? I can see how that can skew the perspective of the validation agent. Is the scope of the test too narrow? Does the test test something too specific? |
Beta Was this translation helpful? Give feedback.
-
|
Latest version (new install), no custom instructions. Using typescript and the TDD reporter. Test results are being logged, and the TDD-Guard is providing responses that indicate it's reading what coder-claude and I are producing. I'm not sure that my problem reflects an issue with TDD-Guard itself other than we can't find a way to "incrementally" add the complexity of an OpenAI call to the requirements by adding a test - the TDD-Guard just looks at it and thinks "you don't need an AI for that test case" and wants us to find the minimal solution. I'm not sure if the answer is I should make custom instructions to tell the guard to be more tolerant in the case of external AI calls, or me and coder-claude need better training on how to do TDD properly. Don't have code handy since TDD won't let claude write it but basically we're trying to transform data with OpenAI, we put in a few rows, expect a few rows of results. So our tests are all things like convertData(example1) expect(result to equal example1). As far as multiple tests - TDD yelled at us for adding more than one test at a time, so that's what we've been doing. Are we expected to be able to create a full set then write the implementation that satisfies them all? |
Beta Was this translation helpful? Give feedback.
-
|
Also, this is a nightmare:
If adding debug logs requires a test, there's really no way to get through this. If I knew what the debug logs would say, I wouldn't need them. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @cahaseler, Thanks for the additional information! I can see how the validator sees "input -> output" and assumes that involving external services is an over-implementation violation. Would it help to start with testing the integration through dependency injection? interface AIProvider {
transform(input: string): Promise<string>;
}
// Test with a test implementation
it('should transform data using AI provider', async () => {
const provider: AIProvider = {
transform: async (input) => `transformed: ${input}`
};
const converter = new DataConverter(provider);
const result = await converter.convert('test');
expect(result).toEqual('transformed: test');
});This way, you can implement your OpenAI integration and then test-drive additional behavior from there. As for debug logging, add this to your IMPORTANT: Debug logs, console statements, and error logging do not require tests and should never be blocked.You can also add a similar instruction for the OpenAI situation while we work on a permanent fix. A current limitation of tdd-guard is that it can't see beyond the current modification. It doesn't know when the main agent gets stuck in a loop. I'm considering two approaches:
I'm leaning to exploring both paths but would love to hear your and other users thoughts on those ideas! |
Beta Was this translation helpful? Give feedback.
-
|
We got through it eventually with an approach similar to your suggestion, thanks! Some of this is definitely just me and Claude learning to build in a TDD friendly way. And yea, allowing debugging logs was definitely a help too! It may be as simple as adjusting the prompt to be a bit more forgiving of generalized solutions that use AI as a less severe escalation in complexity than it would think, or even just tweaking it to give better feedback. As far as approaches go, the two way communication might be interesting, but to some degree that feels like cheating, in that in an ideal world, the test definition would contain all the explanation needed. I worry that giving Claude a way to try to pitch the tdd-guard on the change will lead to him prompt engineering/bullying his way through. The history is interesting too - not sure if it would be better to have it consider it every time or have some kind of meta-checker that keeps an eye on the overall cycle and intervenes as needed? I've been using TDD with the RITEway library as described here and had a lot of luck even without the TDD-guard (though it does start to go off track, which is why I'm here!). https://medium.com/effortless-programming/better-ai-driven-development-with-test-driven-development-d4849f67e339 RITEway has been a great way to get Claude to write tests that are more comprehensibly structured, but I did notice that claude's writing stuff into the RITEWAY assert that isn't visible to TDD-Guard but would probably help it. So I'm going to take a look at the vitest reporter too and see if I can maybe adapt it to surface some of that info. I'll let you know if I have any luck. |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for your feedback! I also feel the same way about the two-way communication strategy. I wonder if instructing the agent to write tests on the correct altitude is the simplest solution. That is what I usually do and it can explain why I haven't encountered such loops. I will move this issue to discussions to keep an eye on this case. Happy to hear if others also face similar situations and how they deal with them :) |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Maybe this is just me and Claude missing something fundamental about TDD, but I can't figure out the answer.
I'm trying to implement a function that calls the OpenAI API. We write a test with an example input and output. It fails. Claude tries to implement with AI, TDD says no, you're overengineering, just return minimal code. Claude hardcodes the exact response we expect. Tests pass. Claude tries to implement with AI. No, can't edit until you've made another failing test. Add a second test case. Test fails. Try to implement with AI. No, overengineering, just create a minimal response. Claude hardcodes a different response. Repeat forever?
Beta Was this translation helpful? Give feedback.
All reactions