Using AI to Maintain Code You Didn't Write

Every engineer has inherited a codebase they didn’t write. The documentation is incomplete, the original developers are gone, the architecture reflects decisions made under constraints you don’t understand, and things break in ways that don’t make sense.

I’m currently maintaining a complex .NET web application that manages grants, speaker requests, and operational workflows for an international organization. I didn’t build it. The codebase has 15+ projects with deep dependency chains, third-party templates, and legacy patterns. Workflows get stuck in inconsistent states, emails don’t send, and navigation properties sometimes load as null despite correct Include() statements.

The traditional approach to a codebase like this is: find the root cause, fix it, move on. But when root causes are unclear — when the system fails intermittently and no single bug explains the behavior — that approach turns into an expensive guessing game.

AI changes the equation.

Defensive engineering instead of root cause hunting

When I brought AI into this maintenance workflow, the approach shifted from “find the one bug” to “systematically eliminate every failure mode.” The distinction matters.

Root cause analysis assumes a single point of failure. Complex legacy systems often don’t work that way. They have multiple interacting weaknesses that combine under specific conditions to produce failures. Fixing one doesn’t fix the problem — it changes which combination of conditions triggers the next failure.

Defensive engineering assumes the system will fail and builds layers that prevent failures from propagating. The AI’s ability to analyze broad codebases quickly makes this approach practical where it previously wasn’t.

Here’s what that looks like in practice:

Transactional consistency. AI analyzed the workflow state management code and identified that several operations modified multiple database records without transactional boundaries. If the process failed between operations, the data was left in an inconsistent state. The fix wasn’t finding which specific failure caused the inconsistency — it was wrapping every multi-step operation in an atomic transaction with cache invalidation.

Two-tier workflow recovery. Instead of preventing every possible failure (impossible in a system this complex), we built a recovery system. Tier 1: automatic retry for operations that fail with transient errors (database timeouts, connection drops). Tier 2: a scheduled job that detects workflows stuck in intermediate states and either completes them or rolls them back with detailed logging.

Defensive data loading. The null navigation property issue was maddening because the Include() statements were correct. The AI suggested the problem might be intermittent connection issues causing partial loads. Instead of finding the specific failure, we implemented exponential backoff retries on data loading operations and added null checks with explicit reload logic.

SQL detection scripts. For every class of bug we fixed, we wrote SQL scripts that detect the bad state — orphaned workflows, incomplete state transitions, missing relationships. These run on a schedule and alert before users encounter the problem.

What AI actually does well in legacy maintenance

Pattern recognition across large codebases. I can give the AI a stack trace and the relevant source files, and it will identify patterns across the codebase that could produce that failure. Not just the immediate call stack, but similar patterns in other modules that might exhibit the same behavior. This is the kind of analysis that takes a human developer hours of reading code, and the AI does it in seconds.

Generating defensive code. Once the AI understands the failure mode, it generates comprehensive defensive code: try/catch blocks with specific error handling, retry logic with appropriate backoff, state validation before and after operations, and logging that captures enough context for debugging. The code isn’t clever — it’s thorough. That’s exactly what legacy maintenance needs.

Test generation for undocumented behavior. The existing codebase had minimal test coverage. The AI reads the implementation and generates tests that verify the actual behavior — not what the code was supposed to do, but what it actually does. This creates a safety net for future changes without requiring full understanding of the original intent.

Identifying what NOT to change. This might be the most valuable capability. When I’m tempted to refactor a confusing piece of code, the AI can trace every caller, every dependency, and every side effect. More than once it’s identified that a “cleanup” I was considering would break a subtle behavior that other parts of the system depend on.

The workflow

My daily process for maintaining this codebase with AI looks like this:

Triage. A bug comes in. I read the error report and pull up the relevant code.
Context gathering. I give the AI the error, the stack trace, and the source files. I ask it to identify every code path that could produce this failure.
Impact analysis. Before fixing anything, I ask: “What else in the codebase uses these same patterns? If this is broken here, where else might it be broken?”
Defensive implementation. Rather than a minimal fix, implement comprehensive protection: transactions, retries, validation, logging. The AI generates the code; I review it for correctness and side effects.
Detection scripts. Write SQL or code that detects the bad state, so we catch it proactively next time.
Test coverage. Generate tests that verify both the fix and the pre-existing behavior we need to preserve.

The result is that each bug fix leaves the system more resilient, not just patched. Over time, the defensive layers accumulate and the system stabilizes — even though we never found a single root cause for most of the failures.

When to use this approach

This approach works best when:

You inherited the codebase and don’t have full context on design decisions
Failures are intermittent and don’t reproduce reliably
The system is in production and can’t be rewritten
Multiple interacting issues make root cause analysis impractical
Test coverage is low and the cost of regression is high

It’s not the right approach for greenfield development, where you should build it right the first time. It’s also not a substitute for understanding the system — the AI helps you build understanding while simultaneously making the system more resilient.

The broader lesson

AI doesn’t make legacy code good. It makes legacy code manageable. The shift from “find and fix the bug” to “systematically eliminate failure modes” is a strategic change in how we approach maintenance, and AI makes it practical by handling the volume of analysis and code generation that this approach requires.

If you’re maintaining a codebase you didn’t write and feeling overwhelmed by intermittent, unexplainable failures — try this approach. Stop hunting for the one root cause. Start building layers of defense. Let AI help you be thorough where you can’t afford to be slow.