methodologyMay 23, 2026·7 min read

How AI Permit Tracking Wins Construction Leads First

By Jonathan Stocco, Founder

What We Set Out to Solve

In 2026, 72% of organizations now use AI in at least one business function, up from 50% in previous years, according to McKinsey's State of AI 2024 report. Construction is not one of the early adopters. Most contractors still rely on word-of-mouth, cold calls, and manual database scraping to find new work. The gap between what's technically possible and what the industry actually does is wide enough to drive a concrete truck through.

The problem we wanted to solve was specific: Florida's county building departments publish permit filings as public records. Every new commercial build, renovation, or structural addition appears in those records before a single shovel breaks ground. That filing is a signal. It tells you who owns the property, what kind of work is planned, the estimated project value, and the timeline. A contractor who sees that filing on day one has a meaningful head start over one who finds out about the project three weeks later through a subcontractor referral.

We set out to build an orchestration system that monitors those filings continuously, qualifies the resulting contacts automatically, and triggers outreach without a human touching the queue. The goal was to compress the window between "permit filed" and "first contact made" from days to minutes.

How the Pipeline Actually Works

The architecture has four distinct stages, and understanding each one matters if you want to replicate or adapt it.

First, a polling node checks county permit portals on a defined schedule. Florida's larger counties, including Miami-Dade, Broward, and Palm Beach, expose structured data through public-facing search interfaces. The automation chain scrapes or queries those interfaces, normalizes the returned data into a consistent schema, and writes new filings to a staging table. Duplicates get filtered by permit number before anything else runs.

Second, a qualification layer runs each new filing through a set of rules: project type, estimated value, jurisdiction, and permit category. A reasoning model handles the cases that don't fit clean rule buckets. If a filing is categorized as "interior alteration" but the description mentions structural steel, the LLM flags it for human review rather than auto-qualifying it. This matters because misclassified leads waste outreach budget and erode sender reputation.

Third, qualified filings trigger contact enrichment. The system pulls ownership records, cross-references business registration data, and attempts to match the property owner to a LinkedIn profile or company website. This step is where most DIY builds fall apart. Public records are inconsistent. Owner names appear in different formats across data sources, addresses use non-standard abbreviations, and LLC structures obscure the actual decision-maker. We spent more time on entity resolution than on any other part of the build.

Fourth, enriched contacts enter an outreach sequence through Catha's funnel integration. The first message goes out within minutes of qualification. The sequence is personalized to the project type and estimated scope, not a generic "we saw your permit" template. Timing, channel selection, and follow-up cadence are all configurable at the workflow level.

What Went Wrong

Three things broke in ways we didn't anticipate.

County portal structure is not stable. Two of the six Florida counties we targeted changed their public search interfaces during the build. One switched from a table-based layout to a JavaScript-rendered single-page application, which broke the scraping node entirely. The fix required switching to a headless browser approach for that county, which added latency and infrastructure cost. If you build this for your own market, plan for portal changes. They happen without notice.

The enrichment step produced false positives at a rate that surprised us. When a property is owned by a holding company with a generic name, the contact resolution logic sometimes matched it to the wrong business. We caught this during testing when a roofing outreach sequence went to a property management firm that had no construction relationship with the project. The fix was adding a confidence threshold: contacts below a certain match score get routed to a human review queue instead of auto-enrolling in the sequence.

The third failure was self-inflicted and worth describing in detail because it's a category of mistake that shows up in any automated build. I ran a workflow update script that was supposed to modify 4 nodes in the pipeline. Instead, it added 12 duplicate nodes. The script searched for node names that had already been renamed by a previous run, found nothing matching the old names, and appended fresh copies without checking whether equivalent nodes already existed. The automation chain went from 32 nodes to 44. Every build script we write now is idempotent: it removes existing nodes by name before adding new ones, handles both pre- and post-rename node names, and verifies the final node count matches the expected total before committing. One extra verification step prevents a class of bugs that are genuinely hard to debug in a live pipeline.

If you want to understand why agentic logic fails in production environments more broadly, this breakdown of common failure modes covers the patterns we see repeatedly across builds.

The Competitive Timing Advantage

The core value proposition here is not the technology. It's the timing.

A building permit filing is public information. Every contractor in your market has access to the same records. The difference is that most of them check those records manually, infrequently, or not at all. A contractor who monitors filings continuously and contacts the property owner the same day the permit is issued is operating in a different competitive environment than one who finds out about the project through a subcontractor three weeks later.

By the time a project appears in a commercial real estate database or gets picked up by a lead aggregator, the owner has often already had conversations with two or three contractors. The window for a first-mover advantage has closed. Monitoring the source directly, before aggregators index it, is the only way to consistently get there first.

This is not a new insight. Title companies, lenders, and equipment suppliers have used permit data for years. What's changed in 2026 is that the enrichment and outreach steps, which previously required a dedicated data team, can now be orchestrated automatically. The barrier to entry has dropped. Which also means the window for competitive advantage will narrow as more contractors adopt similar systems. The time to build this is before it becomes standard practice, not after.

Where This Approach Breaks Down

This system works well for high-volume, high-value permit categories: commercial builds, structural renovations, and multi-unit residential projects. It works less well for small residential jobs where the permit value is low, the decision-maker is a homeowner rather than a business, and the outreach economics don't justify the enrichment cost per contact.

It also requires ongoing maintenance that most contractors underestimate. Portal structures change. Data quality varies by county. The qualification rules need tuning as you learn which project types actually convert. Treating this as a "set it and forget it" build is a mistake. Plan for a monthly review of qualification accuracy and enrichment match rates.

The outreach component carries its own risk. Automated sequences sent to poorly qualified contacts damage your domain reputation and can trigger spam filters that affect your entire email program. The human review queue for low-confidence matches is not optional overhead. It's the mechanism that keeps the system from degrading over time.

For a direct comparison of what manual lead generation actually costs versus what an automated pipeline delivers, this analysis of manual versus automated lead generation walks through the tradeoffs in detail.

What We'd Do Differently

Build the entity resolution layer before anything else. We treated contact enrichment as a downstream problem and paid for it with rework. The quality of your outreach is entirely dependent on the quality of your contact matching. If you're building this for a new market, spend the first two weeks mapping the data sources for that jurisdiction and testing match accuracy before you wire up the outreach sequence.

Start with one county, not six. We tried to cover six Florida counties simultaneously in the first build. The variation in portal structure, data format, and update frequency across those counties created more debugging work than the coverage was worth at that stage. One county, fully working, with clean data and a validated outreach sequence, is more valuable than six counties with inconsistent data quality. Expand after you've proven the model in a single market.

Wire in a dead-man's switch from day one. If the polling node fails silently, you don't know you're missing filings until you notice the lead volume drop. We added an alert that fires if no new records have been ingested within a configurable window. It's a simple check, but it's the difference between catching a broken scraper in an hour versus discovering it a week later when a competitor has already worked the leads you missed.

Related Articles