Test Methodology
Independent Test Protocol
The Independent Test Protocol (ITP) is the testing framework applied to every ForgeWorkflows Logic Blueprint before release. Unlike unit tests that check isolated functions, the ITP runs the complete workflow end-to-end against real APIs with real data. The goal is simple: verify that what we sell actually works.
Every ITP run produces a structured test report with pass/fail results per milestone, measured costs, processing times, and edge case documentation. These results are published directly on the product page in the Technical Reference section. Nothing is estimated, nothing is projected — every number comes from a real test run.
The ITP is a required component of BQS-06 (Test Protocol with Evidence). A blueprint cannot pass BQS v2 without a completed ITP. See the glossary for term definitions.
Milestone Categories
Live Smoke Tests
End-to-end runs against real APIs with real data. No mocks, no stubs. The workflow executes the full pipeline from trigger to output and the result is verified against expected behavior.
Example milestones
- ▪Trigger the workflow with a real lead record and verify output reaches the destination
- ▪Run against multiple data verticals (SaaS, manufacturing, professional services)
- ▪Verify all agent steps execute in sequence without errors
Edge Case Handling
Deliberate stress tests with incomplete, malformed, or adversarial input data. These milestones verify the blueprint degrades gracefully rather than failing silently or producing garbage output.
Example milestones
- ▪Missing required fields (no company name, no email, empty input)
- ▪Malformed data (invalid JSON, truncated payloads, encoding issues)
- ▪API failures (simulated timeouts, rate limits, auth errors)
- ▪Boundary values (extremely long text, special characters, empty strings)
Consistency Scoring
The same input record is run through the workflow multiple times to measure output stability. LLM-powered agents can produce variable results — consistency scoring quantifies that variance and ensures it stays within acceptable bounds.
Example milestones
- ▪Run the same 5 records 3 times each and compare outputs
- ▪Measure scoring variance (e.g., lead score deviation across runs)
- ▪Verify structural consistency (same JSON shape, same field presence)
- ▪Flag any run where output diverges beyond the acceptable threshold
Cost Verification
Actual API costs are measured per record processed — not estimated, not projected. The published cost-per-record on the product page comes directly from these measurements.
Example milestones
- ▪Track token usage per agent step across all test records
- ▪Calculate actual cost-per-record from API billing data
- ▪Verify published cost estimate is within 20% of measured average
- ▪Document cost range (min/max/avg) across different record types
Error Handling Verification
Validates that the Error Handling Matrix documented in the blueprint matches actual behavior. Every documented failure path is triggered and the recovery behavior is verified.
Example milestones
- ▪Trigger each failure mode listed in the Error Handling Matrix
- ▪Verify retry logic fires the correct number of times with backoff
- ▪Confirm dead letter handling captures failed records with full context
- ▪Verify fallback chains activate when primary sources fail
Variable Milestone Counts
ITP milestone counts are not fixed across all products. Each blueprint has a product-specific milestone suite calibrated to its complexity, number of agent steps, external dependencies, and failure modes.
A blueprint with 3 agent steps and 1 external API might have 12 milestones. A blueprint with 8 agent steps, multiple API integrations, and conditional routing paths might have 20 or more. The milestone count is determined during test planning — not after — and is documented before the first test run begins.
The actual milestone count and pass/fail status for every product is published in the Technical Reference section on each product page.
Sample Fixture Structure
Each ITP test run uses structured fixture records. Below is the general shape of a test fixture — actual field names and values vary by product.
{
"fixture_id": "ITP-SDR-001",
"test_category": "smoke_test",
"input": {
"company_name": "Acme Corp",
"industry": "Manufacturing",
"employee_count": 250,
"contact_email": "test@example.com"
},
"expected_behavior": "Full pipeline execution with scored output",
"actual_result": null,
"pass_fail": null,
"cost_usd": null,
"processing_time_ms": null,
"timestamp": null,
"notes": null
}Fields marked null are populated during the test run. The completed fixture becomes the test evidence referenced in the product's Technical Reference.
ITP results are published on every product page in the Technical Reference section.