AI Implementation Group
Agentic AI & MCP

Building MCP Servers: What I Learned Creating AI-Powered DevOps Tools

By Carl Tierney

The Model Context Protocol is one of the most underappreciated developments in the AI tooling ecosystem. MCP lets you extend what AI models can do by giving them structured access to external systems — databases, APIs, file systems, development tools — through a standardized protocol. Instead of copying and pasting context into prompts, you build a server that the AI can query directly.

I’ve built two MCP servers that are now part of my daily development workflow. One reviews pull requests in Azure DevOps. The other maintains a knowledge graph of an entire product domain. Here’s what I learned building them.

The PR Reviewer: AI that understands your codebase

The first server I built connects Claude to Azure DevOps for automated pull request review. It’s not a generic “check this code” tool — it’s a specialized reviewer that understands file types, detects security vulnerabilities in package dependencies, and posts review comments directly to the PR.

The architecture is straightforward: a FastMCP server in Python that exposes tools for listing PRs, fetching diffs, analyzing code changes, and posting review comments. Claude calls these tools through the MCP protocol, and the server translates those calls into Azure DevOps REST API operations.

What makes it useful isn’t the plumbing — it’s the file-type awareness. The server detects 20+ file types (C#, TypeScript, Python, SQL, React, YAML, Dockerfiles, and more) and applies specialized review prompts for each. A SQL migration gets reviewed differently than a React component. A Dockerfile change triggers different concerns than a C# service class.

The security scanner examines package files — NuGet, npm, pip, Maven — and flags outdated dependencies, known CVE vulnerabilities, and suspicious version pinning. This catches the kind of supply-chain risk that manual reviewers routinely miss.

Key lessons from the PR reviewer

File-type-specific prompting matters enormously. A generic “review this code” prompt produces generic feedback. When you tell the model “this is a SQL migration in a healthcare system with HIPAA requirements,” the review quality jumps dramatically. The specificity comes from the MCP server, not from manual prompt engineering.

Post comments, don’t generate reports. The first version generated a markdown report. Nobody read them. The second version posts inline comments directly on the PR, exactly where developers already work. Adoption went from “neat tool” to “always on.”

Authentication complexity is real. Azure DevOps authentication with Personal Access Tokens is simple in theory, but managing token scopes, handling rate limits, and dealing with large diffs (some PRs touch 5,000+ files) required more defensive engineering than the core review logic.

The Knowledge Graph: giving AI a memory of your product

The second server is more ambitious. It maintains a Neo4j knowledge graph that captures the entire domain model, business capabilities, workflows, and requirements for a complex product — a nonprofit granting and reporting platform with three integrated products.

The graph uses a three-label node schema (Category:Domain:Type) that lets Claude reason about entities at different levels of abstraction. A “Theme” in the granting system is both an Entity (it has properties and relationships) and part of the Financial capability domain. This multi-dimensional classification enables queries that traditional documentation can’t answer: “What entities are affected if we change how allocations work?” or “Which workflows cross product boundaries?”

Twenty MCP tools expose the graph to Claude: entity discovery, impact analysis, dependency chain tracing, cross-product coverage checks, full-text search, and maintenance operations. When Claude needs to understand the codebase, it doesn’t read files sequentially — it queries the graph for exactly the context it needs.

Key lessons from the knowledge graph

Ingestion is the hard part. Building the graph database and the query tools took a week. Building the ingestion pipeline that extracts entities, relationships, and traceability from PRD markdown files, glossary definitions, and workflow descriptions took three weeks. The quality of the graph is entirely determined by the quality of the ingestion.

Three-label schemas prevent explosion. Early versions used single labels (just “Entity” or “Workflow”). Queries became impossible as the graph grew. The Category:Domain:Type schema means every query can scope to the right level without scanning the entire graph.

Auto-linking is essential but dangerous. The server auto-links entities to glossary terms and stories to workflows. This saves enormous manual effort but can create false relationships if the matching is too aggressive. I settled on requiring exact term matches plus domain proximity — a “Theme” in Granting and a “Theme” in Reporting are different nodes even though they share a name.

What MCP servers actually change

The standard AI development workflow is: read files, understand context, make changes, test. MCP servers change the “understand context” step from sequential file reading to structured querying. Instead of Claude reading 50 files to understand a domain, it queries the knowledge graph for exactly the entities and relationships it needs.

For the PR reviewer, the change is different: it moves code review from a human bottleneck to an always-available, specialized reviewer that catches security issues, type-specific concerns, and dependency risks before a human ever looks at the PR.

Both servers share a principle: give the AI structured access to information it would otherwise have to infer. File types, dependency trees, domain relationships, business rules — these are all things that exist in your codebase and documentation but require significant effort for an AI to extract from raw text. An MCP server makes that information directly queryable.

Getting started with MCP

If you’re considering building an MCP server, start with a pain point in your existing workflow. What information do you repeatedly copy-paste into prompts? What external system does your AI assistant need to query? That’s your first server.

The FastMCP Python framework makes the basic structure trivial — you can have a working server in under an hour. The real investment is in the domain logic: what tools to expose, what information to include, and how to structure the responses so the AI can use them effectively.

The protocol is standardized. The value is in what you build on top of it.

Related Insights