Agent Simulation Testing
Test Your Site With a Real LLM
Static checks tell you if your site can be read by agents. Simulation testing tells you if agents can actually do things on your site. This is the difference between passing a compliance checklist and passing a real-world test.
What is Agent Simulation Testing?
Agent simulation testing is the practice of sending a real LLM (like Claude or GPT-4o) to interact with your website and measuring whether it can complete specific tasks. Instead of checking for the presence of files or the validity of schemas, it answers a more fundamental question: can an AI agent actually use your site?
The process works like a human QA tester, but the tester is an AI. You define tasks — "Find the pricing for the Pro plan," "Start a free trial," "Locate the API documentation" — and the agent attempts each one. The output is a pass/fail result with a detailed trace of what the agent tried, where it got stuck, and why it succeeded or failed.
This approach catches problems that static analysis misses. Your llms.txt might be perfect, your schemas valid, your robots.txt open — but if your pricing page is a client-rendered React component with no server-side content, the agent sees an empty page and cannot answer the most basic question about your product.
How Agent Simulation Works
The simulation follows a structured pipeline:
Context Gathering
The agent reads your llms.txt, homepage, and sitemap to build an understanding of your site structure — just like a real agent would in the wild.
Task Assignment
The system assigns tasks representing common agent use cases: finding pricing, locating documentation, identifying contact methods, starting a trial, and comparing features.
Autonomous Navigation
The agent navigates your site following links, reading content, and attempting to extract the information needed to complete each task. It does not use a browser — it reads the raw content agents actually see.
Completion Assessment
Each task is scored as completed, partially completed, or failed. The agent provides evidence (extracted text, URLs visited) to justify its assessment.
Trace Report
You receive a full trace showing every page the agent visited, what it extracted, where it got stuck, and recommendations for improvement.
What Simulation Testing Catches
Simulation testing reveals problems that are invisible to static analysis:
Client-side rendering gaps
Pages that look fine in a browser but return empty HTML to agents. Common with SPAs that defer all rendering to JavaScript.
Broken navigation paths
Links that work for humans (via client-side routing) but do not resolve when an agent follows the raw href attribute.
Missing or ambiguous pricing
Pricing pages that use interactive sliders, toggles, or custom calculators that agents cannot operate. The agent sees the page but cannot extract a price.
Gated content without signals
Content behind login walls without any indication in the HTML that authentication is required. Agents see a blank page or redirect loop.
Conflicting information
Pages that say one thing in visible text and another in JSON-LD. Agents that find contradictions lose confidence in your site.
Dead-end pages
Pages with no internal links, no schema, and no clear next step. Agents arrive and cannot figure out what to do next.
How to Test Your Site Manually
You can run a basic agent simulation yourself using any LLM with web browsing capabilities. Open ChatGPT, Claude, or Perplexity and ask it to complete tasks on your site:
- "Go to [your-site.com] and tell me how much the Pro plan costs."
- "Find the API documentation for [your-site.com] and show me how to authenticate."
- "What does [your-company] do? Find the answer from their website, not your training data."
- "Walk me through starting a free trial on [your-site.com]."
If the agent struggles, hallucinates, or gives wrong answers — those are the same failures real users will experience when they ask AI assistants about your product. The difference: you will never see those failures in your analytics because the agent never clicks through to your site. The user just gets a wrong answer and moves on.
Automated Simulation in AX Audit
The AX Audit includes automated agent simulation as part of the Agent Task Completion dimension (20% of your total AX score). When you run an audit, we send a real LLM to attempt standardized tasks on your site and measure success.
The automated simulation is faster and more consistent than manual testing. It uses the same task set across all sites, producing comparable scores. It also runs after every re-audit, so you can track improvement over time.
Sites that score well on static checks (Discoverability, Parsability, Schema) but poorly on Task Completion have a clear signal: the content is there, but the user experience for agents is broken. This usually points to rendering issues, navigation problems, or content that is technically present but practically unusable.
Task Completion and Your AX Score
Agent Task Completion carries the highest weight (tied with Discoverability at 20%) in the AX scoring model. This is intentional — all the technical optimization in the world is meaningless if agents cannot actually accomplish their goals on your site. A perfect score on the other five dimensions with a zero on Task Completion still means your site is not agent-ready.
Run a Free Agent Simulation
The AX Audit sends a real LLM to attempt tasks on your site and reports what succeeded, what failed, and why. See how your site performs when the visitor is an AI agent.
Run a Free Agent SimulationFree audit with live LLM simulation. No signup required.