methodologyApr 24, 2026·7 min read

Activepieces and MCP: What We Learned Building AI Workflows

What We Set Out to Build

In early 2026, we started evaluating open-source alternatives to proprietary automation platforms after watching per-integration pricing compound across a mid-market client's stack. The question wasn't philosophical - it was arithmetic. Zapier and Make charge per task or per operation. When you're running AI agents that trigger dozens of downstream actions per invocation, those fees accumulate in ways that aren't obvious until the invoice arrives.

Activepieces caught our attention for a specific reason: its native support for MCP (Model Context Protocol) servers. At the time we started this evaluation, the platform listed 400+ MCP servers, giving AI agents a standardized way to talk to external tools without custom connector code. That's a meaningful architectural difference from platforms where every integration is a proprietary abstraction you can't inspect or modify.

We wanted to answer one practical question: can a developer team with no prior Activepieces experience go from zero to a functioning AI agent workflow - one that reads from a CRM, runs a reasoning step, and writes results back - in a single sprint? The answer turned out to be "yes, but with three non-obvious failure modes you'll hit before you finish."

What Happened - Including What Went Wrong

The first workflow we attempted was a job-change detection pipeline. An AI agent would receive a webhook payload containing a contact record, run a web lookup to check for employment changes, score confidence, and push the result back to HubSpot. Straightforward on paper.

The MCP server configuration was the easy part. Activepieces' interface for connecting MCP servers is closer to n8n's credential management than to Zapier's guided setup - you're configuring JSON, not clicking through a wizard. That's fine if your team is comfortable with that level of abstraction, but it's a real barrier for non-developers. We'll come back to that tradeoff.

The first failure was prompt-related, and it's one I've now seen repeat across multiple pipelines. Our workflow accepted an optional hint field - new_company_hint - from the webhook payload. The system prompt mentioned the field existed but didn't specify how it should affect confidence scoring. The LLM treated it as weak background context instead of strong corroborating evidence. A confirmed company match from a web lookup plus a matching hint from the CRM should have pushed confidence above 0.5. Instead, scores stayed at 0.2–0.3. We added four lines to the system prompt: what the hint represents, how to cross-reference it against web evidence, how confirmation affects the threshold, and what to do when no hint exists. Scores corrected immediately. LLMs don't infer scoring intent from field names - you have to spell out every rule explicitly.

The second failure was architectural. We tried to combine a scheduled trigger and a webhook trigger inside a single Activepieces workflow. The platform doesn't support that - a schedule and a webhook cannot coexist in one workflow definition. You need two separate workflows that share state through an intermediary (a database table, a queue, or a shared variable store). We lost half a day to this before finding a single line in the documentation that confirmed it was by design.

The third failure was data accumulation. We used a batch processing node to collect scored results before writing them back to HubSpot. Without enabling static data persistence on that node, it dropped all but the last result in the batch. The fix is two checkboxes in the node configuration, but the default behavior is silent data loss - no error, no warning, just a batch of one where you expected twenty.

The Open-Source Tradeoff Nobody Mentions

Self-hosting Activepieces gives you genuine control: no vendor lock-in, no per-task pricing, full access to the source. Those are real advantages, particularly for teams with compliance requirements around data residency or audit logging.

The cost is operational. You own the infrastructure. You own the upgrades. When a new MCP server version ships and breaks a connector, you're the one debugging it at 11pm. Zapier and Make absorb that maintenance burden in exchange for their pricing premium. That's not a knock on Activepieces - it's the honest tradeoff of every self-hosted system. Teams that have never run their own automation infrastructure consistently underestimate this ongoing cost.

There's also a maturity gap worth naming. As of Q1 2026, the MCP ecosystem is still early. Some of the 400+ servers are well-maintained; others are community contributions with minimal test coverage. Before you build a production pipeline on a specific MCP server, check the repository's commit history and open issues. We found two servers with known authentication bugs that hadn't been patched in four months.

According to McKinsey's 2024 State of AI report (source), 72% of organizations now use AI in at least one business function, up from 50% in previous years. That adoption rate means the tooling ecosystem is under pressure to mature fast - and open-source platforms like Activepieces are absorbing a lot of that demand before the infrastructure is fully stable.

What Actually Works Well

Three things genuinely impressed us.

First, the branching logic. Activepieces handles conditional routing cleanly. We built a pipeline with two conditional phases - one that routes contacts based on employment status, another that adjusts follow-up cadence based on confidence score - and the workflow editor handled it without the visual spaghetti you get in Make when branches multiply.

Second, error handling is first-class. Dead letter queue support is built in, not bolted on. Every node can be configured with a failure path, and failed executions land in an inspectable queue rather than disappearing silently. This matters more than it sounds - silent failures in automation pipelines are how bad data propagates into your CRM for weeks before anyone notices. If you're evaluating any automation platform, treat dead letter queue support as a non-negotiable requirement, not a nice-to-have.

Third, the webhook handling is honest about its quirks. Activepieces documents the body nesting behavior that catches most developers on their first integration. Two lines of defensive code - checking for nested payload structure before accessing fields - prevents the class of errors that would otherwise surface as mysterious null values three steps into a workflow. We've written about similar webhook gotchas in automation pipelines before; Activepieces at least makes the behavior predictable.

Lessons With Specific Takeaways

After running this evaluation across three separate pipeline builds, here's what we'd tell a developer starting their first Activepieces project:

Treat every LLM scoring rule as explicit configuration, not implied context. If your agent uses a confidence threshold, write the threshold into the system prompt. Write what each input field means. Write what happens when a field is absent. A reasoning model will not infer your intent from a field name like new_company_hint - it needs the rule stated plainly. We spent two days debugging score suppression that four lines of prompt text resolved.

Separate your triggers from the start. If a workflow might ever need both scheduled and event-driven execution, design two workflows from day one. Retrofitting this split after you've built branching logic is painful. The split-workflow pattern - one workflow owns the schedule, one owns the webhook, both write to shared state - is the right architecture even when it feels like over-engineering early on.

Measure your actual token consumption before committing to a cost model. We've seen web lookup steps consume tokens at roughly double the theoretical estimate when you account for context window padding and response parsing. Memory-based rate estimates diverged from measured costs by 30–50% in our testing. Budget conservatively and instrument your token usage from the first run, not after you've committed to a pricing tier.

Verify MCP server health before building on it. Check commit recency, open issues, and whether the authentication flow has been tested against the current version of the target API. Two of the servers we evaluated had known bugs that would have broken our pipeline in production. The 400+ server count is a ceiling on what's possible, not a guarantee of what's ready.

For teams comparing Activepieces against other automation platforms, our honest review of AI employee platforms covers the broader landscape of what's available in 2026 and where each approach breaks down.

What We'd Do Differently

Start with a single MCP server and one complete round-trip before adding agents. We made the mistake of configuring three MCP servers simultaneously in our first build. When authentication failed, we spent time isolating which server was the problem. One server, one agent, one verified round-trip - then expand. The debugging surface stays manageable.

Write a scoring rubric before writing a single line of prompt. Every time we've built a pipeline where an LLM assigns a score or makes a classification, the failures trace back to ambiguous scoring criteria. The rubric - what inputs mean, how they interact, what the output range represents - should exist as a document before it becomes a system prompt. This forces clarity that the prompt alone doesn't.

Make all external writes non-blocking from the beginning. We learned this the hard way on a different pipeline: a HubSpot 403 error threw away completed intelligence because the write was blocking the return path. In Activepieces, as in any automation system, external writes should be fire-and-forget with their own error handling branch. The core pipeline logic should complete regardless of whether the downstream write succeeds.