Open Source

Automating Documentation Testing: How Drasi and GitHub Copilot Catch Silent Bugs

Drasi uses GitHub Copilot to automate documentation testing with AI agents, catching silent bugs from dependency drift and missing context, improving onboarding and developer trust.

Published 2026-05-04 08:29:02 • Farkesli Staff

For early-stage open-source projects, the "Getting Started" guide is often the first real interaction a developer has with the project. If a command fails, an output doesn’t match, or a step is unclear, most users won’t file a bug report—they will just move on. Drasi, a CNCF sandbox project that detects changes in your data and triggers immediate reactions, is supported by a small team of four engineers in Microsoft Azure’s Office of the Chief Technology Officer. The team moves fast, shipping code faster than they can manually test their comprehensive tutorials. This gap remained hidden until late 2025, when a GitHub Dev Container infrastructure update broke every single tutorial. This incident forced a realization: with advanced AI coding assistants, documentation testing can be converted to a monitoring problem.

The Problem: Why Documentation Breaks

Documentation usually breaks for two main reasons, each silently eroding user trust.

Automating Documentation Testing: How Drasi and GitHub Copilot Catch Silent Bugs — Source: azure.microsoft.com

The Curse of Knowledge

Experienced developers write documentation with implicit context. When they write "wait for the query to bootstrap," they know to run drasi list query and watch for the Running status, or even better—run the drasi wait command. A new user has no such context. Neither does an AI agent. They read the instructions literally and don’t know what to do. They get stuck on the "how," while the documentation only covers the "what."

Silent Drift

Documentation doesn’t fail loudly like code does. When you rename a configuration file in your codebase, the build fails immediately. But when your documentation still references the old filename, nothing happens. The drift accumulates silently until a user reports confusion. This is compounded for tutorials like Drasi's, which spin up sandbox environments with Docker, k3d, and sample databases. When any upstream dependency changes—a deprecated flag, a bumped version, or a new default—tutorials can break without any alert.

The Turning Point: A Real-World Failure

The team didn’t realize how big this gap was until late 2025, when GitHub updated its Dev Container infrastructure, bumping the minimum Docker version. The update broke the Docker daemon connection—and every single tutorial stopped working. Because the team relied on manual testing, they didn’t immediately know the extent of the damage. Any developer trying Drasi during that window would have hit a wall. This incident lit a fire under the team to find a scalable, automated solution.

The Solution: AI Agents as Synthetic Users

To solve this, the team treated tutorial testing as a simulation problem. They built an AI agent that acts as a "synthetic new user." This agent has three critical characteristics:

It is naïve: It has no prior knowledge of Drasi—it knows only what is explicitly written in the tutorial.
It is literal: It executes every command exactly as written. If a step is missing, it fails.
It is unforgiving: It verifies every expected output. If the doc says "You should see 'Success'," and the CLI just returns silently—the agent flags it and fails fast.

By emulating a brand-new user, the agent catches both missing context and silent drift before real users encounter them.

Building the Testing Stack with GitHub Copilot CLI and Dev Containers

The implementation leverages GitHub Copilot CLI and Dev Containers to create an automated pipeline. The agent runs inside a Dev Container that mirrors the exact environment described in the tutorial. Using GitHub Copilot CLI, it can parse natural language instructions and translate them into precise shell commands. The agent then executes these commands step by step, checking outputs against expected results documented in the tutorial. If any step fails, the agent stops, logs the issue, and alerts the team immediately.

The team configured this agent to run on every commit to the documentation repository. This turns testing into a continuous monitoring process—similar to how CI/CD pipelines catch code bugs. The agent doesn't just run the commands; it also checks for edge cases like version mismatches and deprecation warnings, which are common sources of silent drift.

Results and Impact

Since deploying this system, the team has caught dozens of documentation bugs that would have gone unnoticed until a user complained. Tutorials now stay reliable even as upstream dependencies evolve. The agent has also highlighted areas where implicit knowledge was assumed: for instance, missing explanations of common CLI flags or expected wait times. The result is a smoother onboarding experience for new developers, and less time spent manually re-testing guides after each release.

The team estimates that the automated agent has reduced the time spent on documentation QA by over 70%, allowing them to focus on building features rather than babysitting tutorials. Moreover, it has shifted their mindset: documentation is no longer a static artifact but a live component that must pass automated tests.

Future Directions

The team plans to extend this approach to cover more complex scenarios, such as integration with multiple data sources and error-handling paths. They are also exploring how to share this methodology with other open-source projects in the CNCF ecosystem. By treating documentation testing as a monitoring problem, Drasi has turned a pain point into a competitive advantage—ensuring that every "Getting Started" moment is a success.

Interested readers can dive deeper into the AI agent design or the technical stack used to implement it.