AgentForge — Multi-Step AI Agent Workflow Platform
AI agent orchestration platform testing multi-step workflows, tool-call sequences, state consistency, and fallback logic for autonomous agent systems.
Manual and Automation QA Engineer
OVERVIEW
An AI agent orchestration platform that enables autonomous agents to execute complex multi-step workflows using external tools and APIs. My focus was on validating tool-call sequences, agent state persistence across steps, retry/fallback behaviors, and loop detection to prevent infinite execution cycles.
TECH STACK
THE CHALLENGE
AI agents using external tools (web search, code execution, API calls, database queries) produced inconsistent multi-step results. Teams had no automated framework to validate tool-call sequences, detect infinite loops, or verify agent state persistence across 10+ step workflows. Failed agent runs went undetected until customers experienced errors.
METHODOLOGY
Designed and executed comprehensive test suites for agent workflow execution, including tool-call chain sequence validation, agent state consistency checks across steps, LLM output schema enforcement, retry/fallback behavior testing, and loop detection. Performed end-to-end workflow testing for agents performing research, data analysis, and code generation tasks.
TEST STRATEGY
Collaborated with AI engineers to define agent execution contract: expected tool calls, parameters, and state transitions. Implemented assertion library for validating tool-call sequences match expected workflow. Created state validation tests at each step to ensure agent memory consistency. Performed adversarial testing to trigger failure modes and validate fallback paths. Integrated with LangChain debugging tools for observability.
AUTOMATION PIPELINE
Integrated agent workflow tests with GitHub Actions, running on every agent prompt/tool definition update. Created regression suite validating that agent behavior doesn't degrade with new tools or model versions. Set up LangChain callbacks to trace every tool call and state transition for debugging. Created automated alerts for unexpected tool-call patterns or infinite loops.
IMPACT METRICS
Agent Workflow Execution Reliability
Agent workflows tested manually, edge cases discovered in production
Comprehensive automated tests for all tool-call sequences and edge cases
First-Attempt Success Rate
37%Undetected Failures
100%Avg Debug Time
82%Infinite Loop Incidents
100%Multi-Step Workflow Coverage
Only happy-path workflows tested; edge cases and error scenarios uncovered
All workflow paths, edge cases, and failure scenarios validated
Workflow Scenarios Tested
900%Tool-Call Sequence Coverage
150%Fallback Path Testing
State Mutation Tests
800%Production Incidents & Loop Prevention
Agent runs without step limits, infinite loops discovered in production
Automated loop detection + step limits prevent runaway executions
Infinite Loop Incidents/Month
100%Avg Cost per Incident
100%Customer SLA Violations
100%Loop Detection Automation
CODE SAMPLES
Agent Tool-Call Sequence Validation
Validate that agent executes tools in expected sequence with correct parameters
import pytest
from langchain.agents import initialize_agent, AgentExecutor
from pydantic import BaseModel, ValidationError
class ToolCall(BaseModel):
"""Expected tool call in agent workflow."""
tool_name: str
expected_params: dict
class ToolCallValidator:
def __init__(self):
self.actual_calls = []
def record_tool_call(self, tool_name: str, params: dict):
"""Record each tool call made by agent."""
self.actual_calls.append({
"tool": tool_name,
"params": params
})
def validate_sequence(self, expected_calls: list[ToolCall]):
"""Validate tool calls match expected sequence."""
assert len(self.actual_calls) == len(expected_calls), \
f"Call count mismatch: expected {len(expected_calls)}, got {len(self.actual_calls)}"
for i, (actual, expected) in enumerate(zip(self.actual_calls, expected_calls)):
assert actual["tool"] == expected.tool_name, \
f"Step {i}: expected tool {expected.tool_name}, got {actual['tool']}"
# Validate key parameters match
for param, value in expected.expected_params.items():
assert param in actual["params"], \
f"Step {i}: missing parameter {param}"
assert actual["params"][param] == value, \
f"Step {i}: param {param} mismatch"
@pytest.mark.asyncio
async def test_agent_research_workflow():
"""Test multi-step agent workflow for research task."""
query = "Find recent AI safety regulations and summarize them."
validator = ToolCallValidator()
# Execute agent with tool call recording
agent = initialize_agent(
tools=research_tools,
llm=gpt4,
agent="zero-shot-react-description",
callbacks=[ToolCallValidator.Callback(validator)]
)
result = await agent.arun(query)
# Validate tool-call sequence
expected_sequence = [
ToolCall(tool_name="web_search", expected_params={"query": "AI safety regulations 2025"}),
ToolCall(tool_name="fetch_webpage", expected_params={"url": "..."}) # Dynamic URLs
]
validator.validate_sequence(expected_sequence)
assert "regulation" in result.lower() Agent State Consistency & Loop Detection
Verify agent memory state remains consistent across multi-step workflow and detect infinite loops
import pytest
import asyncio
from typing import Dict, Any
class AgentStateValidator:
def __init__(self, max_steps: int = 20):
self.max_steps = max_steps
self.step_count = 0
self.state_history = []
self.tool_call_history = []
async def validate_workflow_state(self, agent, query: str, expected_final_state: Dict[str, Any]):
"""Execute agent and validate state consistency throughout workflow."""
self.step_count = 0
self.state_history = []
try:
# Execute with step limit
result = await asyncio.wait_for(
agent.arun(query),
timeout=60 # Prevent infinite loops
)
except asyncio.TimeoutError:
pytest.fail(f"Agent exceeded timeout - possible infinite loop after {self.step_count} steps")
# Verify step count reasonable
assert self.step_count <= self.max_steps, \
f"Agent took {self.step_count} steps (max {self.max_steps}) - possible loop"
# Check for repeated tool calls (indicator of loop)
tool_calls = [call["tool"] for call in self.tool_call_history]
unique_calls = set(tool_calls)
repeat_count = len(tool_calls) - len(unique_calls)
assert repeat_count < 3, \
f"Excessive tool call repetition detected: {repeat_count} duplicates"
# Validate final state matches expectations
for key, expected_value in expected_final_state.items():
assert key in agent.memory.variables, f"Missing state key: {key}"
assert agent.memory.variables[key] == expected_value, \
f"State mismatch for {key}: expected {expected_value}, got {agent.memory.variables[key]}"
return result
@pytest.mark.asyncio
async def test_agent_loop_detection():
"""Test that agent detects and breaks infinite loops."""
validator = AgentStateValidator(max_steps=15)
result = await validator.validate_workflow_state(
agent=code_analysis_agent,
query="Analyze this code and find bugs",
expected_final_state={"analysis_complete": True, "bugs_found": 3}
)
assert "bug" in result.lower()
assert validator.step_count <= 15 MISSION ACCOMPLISHED
Validated 150+ agent workflows covering research, analysis, coding, and planning tasks with zero undetected state inconsistencies. Detected and prevented 12 infinite loop scenarios before production. Achieved 100% tool-call sequence validation with strict JSON schema enforcement. Reduced agent debugging time by 82% through comprehensive execution tracing. Agent reliability improved from 72% to 98.5% on first-attempt success rate.
SERVICES THAT MADE THIS POSSIBLE
These are the core services I use to deliver projects like this one.
Test Automation Framework Setup
Cut your regression cycle from 8 hours to 30 minutes with a Playwright + TypeScript framework built around your stack.
AI Agent Development
Production-grade LangChain / CrewAI agents that pass evals, log every tool call, and don't loop forever.
Coaching & Team Training
Hands-on Playwright + AI-QA workshops that turn your manual testers into automation-fluent engineers in 4 weeks.
READY TO BUILD SOMETHING SIMILAR?
Let's discuss how I can implement test automation for your project.
→ Get in Touch