Problem Statement
AgentReady's current testing approach has too many tests with insufficient signal:
- Test failures: Unclear what broke and why
- Flaky tests: Tests fail intermittently without code changes
- Slow CI: Tests take too long, slowing development velocity
- Low coverage: ~37% coverage despite many tests
- GHA complexity: Multiple workflows with overlapping responsibilities
Root cause: Focus on quantity over quality. More tests ≠ better testing.
Proposed Solution: Signal-Focused Testing Strategy
Phase 1: Categorize and Audit Existing Tests (Week 1)
Goal: Understand what we have and what provides value.
-
Inventory all tests:
- Count tests by category (unit, integration, e2e)
- Identify duplicate/overlapping tests
- Find tests with unclear assertions
- Flag flaky tests (fail >5% of runs)
-
Measure signal quality:
- Which tests catch real bugs?
- Which tests provide clear failure messages?
- Which tests are too brittle (fail on safe refactors)?
-
Deliverable: Testing audit report
- List of tests to keep/delete/refactor
- Signal-to-noise ratio analysis
- Recommended testing philosophy document
Phase 2: Simplify GitHub Actions (Week 1-2)
Goal: Reduce GHA complexity and improve CI speed.
Current state:
- Multiple workflows with overlapping responsibilities
- Tests run multiple times (waste compute)
- Hard to understand what failed and why
Proposed changes:
-
Consolidate workflows:
- Single PR workflow for all quality checks
- Separate release workflow (keep existing)
- Remove redundant/duplicate checks
-
Optimize test execution:
- Run E2E tests first (fast, high signal)
- Run unit tests in parallel by module
- Skip slow tests for draft PRs
- Cache dependencies aggressively
-
Improve failure reporting:
- Clear job names that explain what they test
- Fail-fast for E2E failures
- Annotate PRs with specific failure context
-
Deliverable: Simplified GHA configuration
- Single
.github/workflows/pr.yml for all checks
- Clear job structure with descriptive names
- <5 minute CI time for typical PRs
Phase 3: Refactor Test Suite (Week 2-3)
Goal: High-signal tests that catch real issues quickly.
Testing pyramid target:
E2E Tests (5-10 tests) ← Critical user journeys only
├─ Happy path: assess current repo
├─ Error handling: invalid config
├─ Security: sensitive directory blocking
└─ Performance: large repo (<5min timeout)
Integration Tests (20-30 tests) ← Module boundaries
├─ Scanner + Assessors
├─ Reporter + Templates
└─ CLI + Services
Unit Tests (100-150 tests) ← Core logic only
├─ Assessment scoring algorithm
├─ Pattern extraction
├─ Research report validation
└─ Edge cases and error handling
Principles:
-
Each test has clear purpose:
- What does it test? (one thing)
- What could break? (specific failure mode)
- How do you fix it? (actionable error message)
-
Avoid testing implementation details:
- Test behavior, not internal structure
- Refactors shouldn't break tests
- Mock only external dependencies
-
Fast feedback:
- E2E tests: <10s each (total <2min)
- Integration tests: <1s each
- Unit tests: <100ms each
- Full suite: <5min
-
Deliverable: Refactored test suite
- Delete 50%+ of existing tests (low signal)
- Rewrite 30% with clearer assertions
- Keep 20% as-is (already good)
- Target 70% coverage of critical paths
Phase 4: Documentation & Process (Week 3-4)
Goal: Prevent test suite from degrading again.
-
Testing guidelines (TESTING.md):
- When to write unit vs integration vs e2e tests
- How to write high-signal tests
- Common anti-patterns to avoid
-
PR checklist template:
- New feature = new test (which category?)
- Bug fix = regression test first
- Refactor = tests stay green
-
Test review process:
- Code reviews include test quality check
- PRs with low-signal tests get feedback
- Flaky test reports trigger investigation
-
Deliverable: Testing culture documentation
TESTING.md guide
- Updated
CONTRIBUTING.md with test requirements
- PR template with test checklist
Success Metrics
| Metric |
Current |
Target |
Measure |
| CI time |
~15min |
<5min |
GitHub Actions duration |
| Test count |
~800 tests |
150-200 tests |
pytest count |
| Coverage |
~37% |
70% (critical paths) |
pytest-cov |
| Flakiness |
Unknown |
<1% failure rate |
Track over 100 runs |
| Signal quality |
Low |
High |
Failure investigation time |
Definition of "high signal":
- When test fails, developer knows what broke immediately
- Fix time: <30 minutes from failure to root cause identified
- False positive rate: <1% (tests fail only when code is broken)
Out of Scope (Not Changing)
- E2E test framework (pytest is fine)
- Assertion library (assert statements are fine)
- Test discovery mechanism (pytest auto-discovery works)
Related Issues
Acceptance Criteria
Priority: P0
Why P0: Testing is infrastructure. Bad tests slow down all development.
Timeline: 3-4 weeks (can be done incrementally in PRs)
Assignee: TBD (could be broken into multiple assignees for phases)
🤖 Generated with Claude Code
Problem Statement
AgentReady's current testing approach has too many tests with insufficient signal:
Root cause: Focus on quantity over quality. More tests ≠ better testing.
Proposed Solution: Signal-Focused Testing Strategy
Phase 1: Categorize and Audit Existing Tests (Week 1)
Goal: Understand what we have and what provides value.
Inventory all tests:
Measure signal quality:
Deliverable: Testing audit report
Phase 2: Simplify GitHub Actions (Week 1-2)
Goal: Reduce GHA complexity and improve CI speed.
Current state:
Proposed changes:
Consolidate workflows:
Optimize test execution:
Improve failure reporting:
Deliverable: Simplified GHA configuration
.github/workflows/pr.ymlfor all checksPhase 3: Refactor Test Suite (Week 2-3)
Goal: High-signal tests that catch real issues quickly.
Testing pyramid target:
Principles:
Each test has clear purpose:
Avoid testing implementation details:
Fast feedback:
Deliverable: Refactored test suite
Phase 4: Documentation & Process (Week 3-4)
Goal: Prevent test suite from degrading again.
Testing guidelines (
TESTING.md):PR checklist template:
Test review process:
Deliverable: Testing culture documentation
TESTING.mdguideCONTRIBUTING.mdwith test requirementsSuccess Metrics
Definition of "high signal":
Out of Scope (Not Changing)
Related Issues
Acceptance Criteria
TESTING.mdguide created and reviewedPriority: P0
Why P0: Testing is infrastructure. Bad tests slow down all development.
Timeline: 3-4 weeks (can be done incrementally in PRs)
Assignee: TBD (could be broken into multiple assignees for phases)
🤖 Generated with Claude Code