Test Generation Automation: From 45% to 89% Coverage in 30 Days

The Challenge: A Codebase Without a Safety Net

NovaPay, a fintech startup processing $2.3 million in daily transactions, had a testing problem. Their Laravel application had grown to 180,000 lines of code across 420 PHP files, but test coverage sat at just 45%. Critical payment flows, webhook handlers, and reconciliation logic had little to no test coverage.

The consequences were real. In Q4 2025, the team shipped three bugs to production that affected payment processing — each requiring emergency hotfixes and on-call engineers scrambling at midnight. The root cause in all three cases: untested edge cases in code that had been modified during feature development.

"We knew our test coverage was a liability. Every deploy felt like playing Russian roulette. But with a 12-person team shipping features on a two-week sprint cycle, nobody had time to write tests for existing code. We were stuck in a cycle of tech debt." — Marcus Chen, Engineering Lead, NovaPay

45%

Starting Coverage

180K

Lines of Code

Engineers on Team

The Approach: AI-Assisted Test Generation

NovaPay adopted TailwindPHP's test generation feature with a structured 30-day plan. The goal wasn't just to hit a coverage number — it was to build a test suite that actually caught real bugs and gave the team confidence to ship.

The 30-Day Plan

Week 1

Critical Path First: Generated tests for payment processing, webhook handlers, and authentication flows. These were the highest-risk areas with zero existing tests. Coverage: 45% → 58%

Week 2

API Endpoints: Generated feature tests for all 87 API endpoints, including validation, authorization, and error handling. Coverage: 58% → 72%

Week 3

Business Logic: Generated unit tests for services, actions, and domain logic classes. Focused on edge cases and boundary conditions. Coverage: 72% → 83%

Week 4

Edge Cases & Cleanup: Generated tests for error handling, queue jobs, scheduled tasks, and mail notifications. Manual review and cleanup of all generated tests. Coverage: 83% → 89%

How TailwindPHP Generates Tests

TailwindPHP's test generation isn't a generic "create a test file" feature. It analyzes your actual code — the implementation, the types, the relationships, the validation rules — and generates tests that exercise real behavior.

Here's a real example from NovaPay's codebase. Given this service class:

php — app/Services/PaymentService.php
<?php

namespace App\Services;

use App\Models\Payment;
use App\Models\User;
use App\Exceptions\InsufficientFundsException;
use App\Events\PaymentProcessed;

class PaymentService
{
    public function processPayment(
        User $user,
        float $amount,
        string $currency = 'USD'
    ): Payment {
        if ($amount <= 0) {
            throw new \InvalidArgumentException(
                'Amount must be positive'
            );
        }

        if ($user->balance < $amount) {
            throw new InsufficientFundsException($user, $amount);
        }

        $payment = Payment::create([
            'user_id' => $user->id,
            'amount' => $amount,
            'currency' => $currency,
            'status' => 'completed',
        ]);

        $user->decrement('balance', $amount);

        event(new PaymentProcessed($payment));

        return $payment;
    }
}

TailwindPHP generated the following comprehensive test suite:

php — tests/Unit/PaymentServiceTest.php (AI-generated)
<?php

use App\Services\PaymentService;
use App\Models\{User, Payment};
use App\Events\PaymentProcessed;
use App\Exceptions\InsufficientFundsException;
use Illuminate\Support\Facades\Event;

beforeEach(function () {
    $this->service = new PaymentService();
    $this->user = User::factory()->create([
        'balance' => 1000.00
    ]);
});

test('processes payment successfully', function () {
    Event::fake();

    $payment = $this->service->processPayment(
        $this->user, 250.00
    );

    expect($payment)->toBeInstanceOf(Payment::class)
        ->amount->toBe(250.00)
        ->currency->toBe('USD')
        ->status->toBe('completed');

    expect($this->user->fresh()->balance)->toBe(750.00);

    Event::assertDispatched(PaymentProcessed::class);
});

test('throws exception for negative amount', function () {
    $this->service->processPayment($this->user, -50.00);
})->throws(\InvalidArgumentException::class);

test('throws exception for zero amount', function () {
    $this->service->processPayment($this->user, 0);
})->throws(\InvalidArgumentException::class);

test('throws exception for insufficient funds', function () {
    $this->service->processPayment($this->user, 1500.00);
})->throws(InsufficientFundsException::class);

test('supports custom currency', function () {
    $payment = $this->service->processPayment(
        $this->user, 100.00, 'EUR'
    );

    expect($payment->currency)->toBe('EUR');
});

test('does not deduct balance on failure', function () {
    try {
        $this->service->processPayment($this->user, 5000.00);
    } catch (InsufficientFundsException) {}

    expect($this->user->fresh()->balance)->toBe(1000.00);
});

Notice how the AI generated tests that cover the happy path, error conditions, boundary values, side effects (event dispatching), and state verification (balance not deducted on failure). It understood the factory pattern, Pest syntax, and Eloquent's fresh() method for re-fetching from the database.

The Results: By the Numbers

45% → 89%

Code Coverage

1,847

Tests Generated

Regressions Caught

Over 30 days, TailwindPHP generated 1,847 tests across the NovaPay codebase. The team reviewed, refined, and kept 1,623 tests (88% acceptance rate). The remaining 12% were either redundant, tested implementation details rather than behavior, or needed manual adjustment for complex business logic.

Regressions Caught in Week 1

The most dramatic moment came during the first week. After generating tests for the payment processing module, the team ran the new test suite and discovered 3 existing bugs that had been lurking in production:

Currency rounding error: A floating-point precision issue in the currency conversion service that occasionally caused 1-cent discrepancies in multi-currency payments
Race condition in balance checks: Two concurrent payment requests for the same user could both pass the balance check, leading to a negative balance
Missing webhook retry logic: Failed webhook deliveries weren't being retried, causing payment status desynchronization with external providers

Bug #2 — the race condition — had been responsible for two of the three production incidents in Q4 2025. The AI-generated test caught it by testing concurrent payment scenarios, an edge case the team hadn't manually tested.

What Made It Work: The Human + AI Partnership

NovaPay's success wasn't about blindly accepting AI-generated tests. It was about building a systematic workflow where AI handled the tedious parts and humans focused on the important parts.

The Review Process

Every batch of generated tests went through a three-step review:

Run the tests: If they fail, investigate. Is it a bug in the test or a bug in the code? AI-generated tests that fail on first run are surprisingly likely to be exposing real issues.
Check the assertions: AI sometimes tests implementation details (e.g., checking the exact SQL query) rather than behavior (e.g., checking the result). Replace implementation-specific assertions with behavioral ones.
Add business context: AI can't know your business rules. If a test is technically correct but doesn't align with your business logic, adjust it. For example, NovaPay's daily transfer limit of $10,000 wasn't in the code — it was enforced by a third-party API.

What AI Tests Best

NovaPay found that TailwindPHP excelled at generating tests for:

API endpoint validation: Testing all validation rules, authorization checks, and response formats
CRUD operations: Creating, reading, updating, and deleting with proper assertions
Error handling: Testing exception paths, error messages, and HTTP status codes
Data relationships: Testing Eloquent relationships, cascading deletes, and eager loading

What Humans Test Best

The team still wrote manual tests for:

Complex business workflows: Multi-step payment processing with external API interactions
Integration scenarios: End-to-end tests involving multiple services, queues, and external providers
Performance tests: Load testing and response time assertions
Visual regression tests: Frontend rendering and email template output

Long-Term Impact: 90 Days Later

Three months after the initial test generation sprint, NovaPay's metrics showed sustained improvement:

Coverage maintained at 87%+ — the team configured TailwindPHP to generate tests for every new feature, preventing coverage regression
Zero production incidents in Q1 2026 related to code regressions
Deploy frequency increased 40% — from 3 deploys/week to over 4, because the team trusted the test suite
Code review time decreased 25% — reviewers spent less time manually checking edge cases because the test suite already covered them
New developer onboarding improved — tests served as living documentation of expected behavior

"The test suite TailwindPHP helped us build didn't just catch bugs — it changed how we think about shipping code. We went from 'hope it works' to 'we know it works.' That confidence is worth more than any metric." — Marcus Chen, Engineering Lead, NovaPay

Getting Started: Your 30-Day Test Plan

Based on NovaPay's experience, here's a replicable plan for any team looking to dramatically increase test coverage with AI-generated tests:

Week 1: Identify your highest-risk, lowest-coverage code. Generate tests for these areas first. Run the tests — any failures are likely real bugs.
Week 2: Generate tests for all API endpoints and public-facing interfaces. Focus on validation, authorization, and error handling.
Week 3: Generate tests for internal business logic — services, actions, and domain models. Pay attention to edge cases and boundary conditions.
Week 4: Clean up, refine, and fill gaps. Remove duplicate tests, fix flaky tests, and add manual tests for complex business workflows.
Ongoing: Configure TailwindPHP to auto-generate tests for new features. Set a minimum coverage threshold in CI/CD (recommend 80%).

Conclusion

NovaPay's journey from 45% to 89% coverage in 30 days proves that AI-powered test generation isn't just about hitting a coverage number — it's about building a safety net that catches real bugs, gives developers confidence, and fundamentally changes the culture around shipping code.

The key insight: AI generates the tests, but humans provide the judgment. The best results come from treating AI-generated tests as a starting point, not a finished product. Review them, refine them, and let them become the foundation of a testing culture that makes every deploy boring — in the best possible way.