methodologyMay 10, 2026·8 min read

MCP Servers for Claude: What We Learned Testing Them

What We Set Out to Build

In early 2026, we started wiring Model Context Protocol extensions into our automation pipelines. The premise was straightforward: Claude, by default, has no memory of the web, no access to your filesystem, and no way to trigger external systems. MCP changes that. It is a protocol that lets you attach capability modules to a Claude session, turning a chat interface into something closer to an orchestration layer with live data access.

According to McKinsey's 2024 State of AI report, 72% of organizations now use AI in at least one business function, up from 50% in previous years (source). Most of that adoption is still shallow: a chat window here, a summarization step there. What MCP offers is a path from shallow usage to genuine integration, and we wanted to understand exactly where that path holds and where it breaks.

We tested three categories of extensions: file and filesystem tools, live web browsing and scraping modules, and database connectors. The goal was not to document every option exhaustively. The goal was to find the fastest path to real utility and map the failure modes honestly.

What Happened, Including What Went Wrong

Setup for most MCP extensions is genuinely fast. The filesystem module, for instance, requires a JSON configuration block pointing at a local directory and a restart of the Claude desktop client. We had it reading and writing files in under ten minutes. The web browsing extension took slightly longer because it depends on a local browser instance, but nothing about the process required deep technical knowledge.

The first thing that surprised us: the extensions do not behave identically across sessions. We ran the same web scraping task three times against the same target page and got structurally different outputs each time. The reasoning layer inside Claude interprets the retrieved HTML differently depending on how the prompt is framed. This is not a bug in the protocol itself. It is a reminder that you are attaching a non-deterministic language model to a deterministic data source. The combination is not deterministic.

Database connectors exposed a sharper problem. We connected a PostgreSQL instance using a community-built MCP module. The module worked. Claude could query the database, describe the schema, and return rows. What it could not do reliably was generate safe write operations without explicit guardrails in the prompt. On two occasions during testing, it produced UPDATE statements without WHERE clauses. Neither ran, because we were operating in a read-only test environment. But if you wire a database connector into a live system and hand a junior developer a prompt template without reviewing it, you will eventually have a bad day.

The multi-provider trap is worth naming here. Early in our automation work, we built a pipeline that used three separate API providers: one for research, one for scoring, one for writing. The per-lead cost came out $0.016 cheaper than running everything through a single provider. We scrapped it anyway. Three API keys, three billing accounts, three status pages, three sets of rate limits. The operational friction was not worth sixteen-tenths of a cent. We now run every pipeline on a single provider's model lineup. One credential to manage, one bill to track, one place to look when something breaks. The same logic applies to MCP configurations: every additional extension you attach is another dependency that can fail, update, or behave unexpectedly.

The web browsing extension was the most impressive and the most fragile. It handled clean, well-structured pages well. It struggled with JavaScript-heavy single-page applications where content loads asynchronously. It failed entirely on pages behind authentication walls, which is obvious in retrospect but worth stating clearly: MCP browsing is not a substitute for authenticated API access. If the data you need lives behind a login, you need a different approach.

We also hit a rate-limiting issue that took longer to diagnose than it should have. The browsing module was firing requests faster than the target site's CDN allowed. Claude had no awareness of this. It kept retrying, the CDN kept blocking, and the session eventually timed out. Adding explicit delay instructions to the prompt fixed it, but the fix required knowing the problem existed. If you are building pipelines that other people will use, you need to document these constraints or bake them into the configuration.

This is the honest tradeoff with MCP extensions: they lower the barrier to capability, but they raise the surface area for failure. A standalone Claude session has one thing that can go wrong. A Claude session with five extensions attached has six. That is not an argument against using them. It is an argument for adding them one at a time, testing each in isolation, and not treating the protocol as a magic layer that handles complexity for you. If fragmented tooling is already a problem in your stack, adding more integrations without a clear ownership model will make it worse, not better. We wrote about this pattern in more depth in our piece on how fragmented tech stacks kill growth.

Lessons Learned with Specific Takeaways

Three things changed how we think about this protocol after running these tests.

Scope your filesystem access tightly. The default configuration for most filesystem extensions grants access to a broad directory. We narrowed ours to a single working folder. Claude does not need access to your entire home directory to do useful work. Giving it that access creates a larger blast radius if a prompt goes sideways. Point the module at the smallest directory that contains what you actually need.

Treat database extensions as read-only by default. If you need write access, add it explicitly and document why. The reasoning layer will attempt write operations if the prompt implies they are appropriate. It will not ask for confirmation unless you tell it to. Build that confirmation step into your prompt template, not as an afterthought but as a required gate before any mutation runs.

Test extensions against your actual data, not toy examples. We tested the scraping module against a clean static page first and it worked perfectly. When we pointed it at the actual pages we needed in production, two of the five failed because of dynamic content loading. The gap between a demo and a real target is almost always larger than it looks. Budget time for that gap before you commit to a build.

One thing we did not expect: the extensions that provided the most durable value were not the flashiest ones. Live web browsing is impressive in a demo. File management is boring. But the filesystem module, once configured, ran without issues across every session we tested. It did exactly what it said it would do. The browsing module required ongoing prompt tuning to stay reliable. Boring and reliable beats impressive and fragile in any production context.

The developer community on Reddit and in various Discord channels has been moving fast on custom MCP builds. Several teams have published extensions that connect Claude to internal tools: project management systems, CRM records, custom APIs. What ForgeWorkflows calls agentic logic, where a reasoning model decides which tool to call and in what sequence, becomes genuinely useful at this layer. The protocol gives the model a menu of capabilities; the model decides how to combine them. That combination is where the real productivity gains live, not in any single extension in isolation.

The n8n community has been particularly active here. Several workflow builders have published MCP-compatible nodes that let you trigger n8n automations directly from a Claude session. We tested one of these and found it worked reliably for simple trigger-and-forget tasks. For anything requiring conditional logic or error handling, you still want that logic to live in the n8n pipeline itself, not in the Claude prompt. The model is good at deciding what to do. It is less reliable as the sole error handler for a multi-step process.

If you are building automation pipelines and want to see how this kind of modular thinking applies to production-grade builds, our full blueprint catalog shows the patterns we use across different workflow types.

What We'd Do Differently

Start with one extension and run it for a week before adding another. We attached three extensions in the first session because we wanted to test them together. That made it harder to isolate which one was causing the behavior we observed. One at a time, with a real task, over real time, gives you a much cleaner signal about what is actually working.

Build a prompt template library before you build anything else. The extensions are only as reliable as the prompts driving them. We spent more time tuning prompts than configuring the protocol itself. If we had started by writing and versioning prompt templates for each capability, we would have caught the database write problem earlier and the scraping fragility faster. The protocol is infrastructure. The prompts are the application layer. Treat them with the same rigor.

Plan for the extension to break before you need it. Every external dependency has a maintenance cycle. MCP modules are community-built in many cases, which means they update on someone else's schedule and break on yours. Before you wire an extension into anything a client or teammate depends on, decide what the fallback is. If the browsing module goes down, does your pipeline fail gracefully or does it silently return empty results? That question is worth answering before the outage, not during it.

MCP Servers for Claude: What We Learned Testing Them

What We Set Out to Build

What Happened, Including What Went Wrong

Lessons Learned with Specific Takeaways

What We'd Do Differently

Related Articles