Circular Validation: The Hidden Risk in AI-Generated Tests
Table of Contents
Last month, a team I advise shipped a feature with 94% code coverage and all builds green. Production support tickets appeared within a week: a discount calculation that overstated savings by up to five percentage points. The problem traced to the test suite. An AI assistant had generated tests by reading the implementation, and it encoded the same arithmetic error. I call this circular validation.
The Circular Validation Problem #
Point Claude, GitHub Copilot, or any code-generation tool at implementation code and ask for tests. You get a closed feedback loop. Run the tests and they tell you the code works as written, bugs and misinterpretations included.
You write the exam and grade it. The bugs are in the answer key.
Business Intent as the Only Valid Reference Point #
In my experience building financial systems at scale, the tests that found real production defects all started from requirements. The useful questions are specification-level:
- Does the feature do what the acceptance criteria say?
- Does it handle the edge cases the specification defines?
If you derive tests from code, you confirm the code runs. You learn nothing about whether it meets the specification. In financial services, I’ve seen that gap cost teams real money: a rounding error in a discount formula passes through implementation-based tests undetected, then compounds across millions of transactions.
Real-World Example: Discount Calculation Bug #
Consider this user story:
As an online shopper
I want to filter products by discount percentage
So that I can quickly find the best deals
Acceptance Criteria:
- Filter products by minimum discount percentage (10%, 25%, 50%, 75%)
- Show original price, discounted price, and discount percentage
- Hide products below selected discount threshold
- Calculate discount as ((original - current) / original) * 100
- Work with existing category filters
- Allow sorting by highest discount first
The Implementation Bug #
public class ProductService
{
public IReadOnlyList<Product> GetDiscountedProducts(IEnumerable<Product> products,
int? minDiscountPercentage = null,
string category = null,
bool sortByHighestDiscount = false)
{
// The BUG: Incorrect discount calculation formula
// Correct: ((originalPrice - currentPrice) / originalPrice) * 100
// Buggy: ((originalPrice - currentPrice) / currentPrice) * 100
var filtered = products.Where(p =>
{
decimal discountPercentage = ((p.OriginalPrice - p.CurrentPrice) / p.CurrentPrice) * 100;
p.DiscountPercentage = discountPercentage;
bool meetsDiscountFilter = minDiscountPercentage == null || discountPercentage >= minDiscountPercentage;
bool meetsCategoryFilter = string.IsNullOrEmpty(category) || p.Category == category;
return meetsDiscountFilter && meetsCategoryFilter;
});
if (sortByHighestDiscount)
{
return filtered.OrderByDescending(p => p.DiscountPercentage)
.ToList();
}
return filtered.OrderBy(p => p.Name)
.ToList();
}
}
p.DiscountPercentage inside a LINQ .Where() clause is a side-effect that makes the code harder to reason about and test. Side-effect-free queries are a core principle of LINQ.This bug causes incorrect discount percentages, wrong filtering, and improper sorting.
For a $100 product now $80:
- Correct: ((100 - 80) / 100) * 100 = 20% discount
- Buggy: ((100 - 80) / 80) * 100 = 25% discount
AI-Generated Tests (Circular Validation) #
[Test]
public void GetDiscountedProducts_WithMinimumDiscount_ReturnsMatchingProducts()
{
// Act
var result = _service.GetDiscountedProducts(_testProducts, 30);
// Assert - This passes with the buggy implementation!
Assert.AreEqual(2, result.Count);
CollectionAssert.Contains(result, _testProducts[1]); // Premium Phone: ((800-600)/600)*100 = 33.33%
CollectionAssert.Contains(result, _testProducts[2]); // Designer Bag: ((300-150)/150)*100 = 100%
}
These tests pass despite the bug because they’re derived from implementation logic rather than the acceptance criteria.
Intent-Based Tests #
Derive expected values from the acceptance criteria. Acceptance criterion #4 specifies the formula as ((original - current) / original) * 100. A $100 product at $75 is a 25% discount. Write that number directly into the assertion:
[Test]
public void DiscountPercentage_MatchesAcceptanceCriteria()
{
// Arrange - values chosen to make the expected discount obvious
var product = new Product { OriginalPrice = 100, CurrentPrice = 75 };
// Act
var result = _service.GetDiscountedProducts(new[] { product });
// Assert - expected value derived from acceptance criteria, not implementation
// AC #4: ((100 - 75) / 100) * 100 = 25%
Assert.AreEqual(25m, result[0].DiscountPercentage);
}
[Test]
public void Filter_ExcludesProductsBelowMinimumDiscount()
{
// Arrange - a 20% discount product should be excluded at a 25% threshold
var product = new Product { OriginalPrice = 100, CurrentPrice = 80 };
// Act
var result = _service.GetDiscountedProducts(new[] { product }, minDiscountPercentage: 25);
// Assert - AC #3: hide products below selected discount threshold
Assert.IsEmpty(result);
}
Both tests fail against the buggy implementation. Run the first: you see 33.33% instead of 25%. Run the second: the buggy formula calculates 25% instead of 20%, so the product passes the filter when it should not.
Breaking the Circular Validation Cycle #
Keep what you know about the code separate from what you know about the requirements. Build tests from the second.
1. Start with Requirements, Not Implementation #
Derive test cases from acceptance criteria and domain specifications before opening the implementation. Document why each test exists. Test-driven development enforces this by design; AI-assisted generation does not. You have to enforce it yourself.
2. Challenge Implementation Assumptions #
Ask the AI to generate tests that contradict the implementation. Probe edge cases and negative scenarios. If all generated tests pass on the first run, you have a circular validation problem.
3. Adopt a Stakeholder Perspective #
Collaborate with product owners. Feed your tests real production data shapes. In financial services, I test with realistic price distributions and currency rounding rules. A code-reading AI has no access to those constraints.
4. Use AI as a Complement, Not a Replacement #
Use Claude Code and GitHub Copilot for boilerplate scaffolding and edge case suggestions. Then review each generated assertion against your acceptance criteria. Ask yourself whether the test validates the requirement or confirms the code path.
The Strategic Implications #
Better models will not fix this. Circular validation is an input problem: give the model implementation code and you get implementation-shaped tests. Write the prompt against requirements and you get requirement-shaped tests.
Write acceptance criteria before you write code. Keep them in the repo, versioned alongside the code they describe. A junior developer with clear acceptance criteria and an AI assistant will produce better tests than a senior developer working from implementation alone.
In the AI-assisted era, test quality depends on requirements quality. That was always true. Now you can see it in the output.