All posts

The AI that plans our test crawls in May versus April is night and day.

In April, a crawl could wander, hallucinate a step, or quietly “succeed” having done nothing useful. In May we rebuilt the planner to ground every step in what’s actually on the page — and to tell you honestly when something failed instead of faking a green check.

That was the spine of the month: make the thing more trustworthy. Here’s what shipped.

A rebuilt AI crawl planner

Grounded in the live page, recovers from stalls, falls back gracefully when an element shifts, and reports honest failures. Every planned step now maps to something the planner can actually see in the current DOM snapshot — if a click changes the page, it re-plans against the new state instead of barrelling ahead on a stale map. Fewer flaky tests, fewer surprises, and no more quiet “passes” that did nothing.

Platform stability

A public status page and a lot of quiet work on graceful degradation. When our infra providers had a rough month, our users mostly didn’t notice — work reroutes across providers and benches anything unhealthy until it recovers. The full story is in When Claude Goes Down, Your Tests Shouldn’t.

Security

Every change we ship runs through AI code review, SAST scanning, and automated security audits before it reaches you. We also open-sourced a supply-chain scanner that catches compromised Python dependencies in seconds. Background on why this matters: the AI coding-agent security gap.

qmax-code is now open source

Our AI testing agent for the terminal is now open source under FSL-1.1-ALv2 — with a Live Browser Feed that streams the actual browser running your tests right into your terminal, and tests that self-heal after they’re generated. Read the launch: qmax-code, open source.

The throughline

If it’s not good enough for us to ship with, it’s not good enough to ship to you. We run an autonomous bot that finds bugs in our own production app and opens a fix PR — every day. We dogfood the product on the product: the same crawl planner, the same self-healing, the same gates that protect your releases protect ours first.

The one line

LLM-agnostic, provider-agnostic, biased on quality.

That’s the bet, and May was a month of making it more literally true: the model underneath can be swapped for anything; what stays constant is the harness of checks around it that decides whether output is trustworthy enough to keep.

Links from May

More landing in June. DMs open. 👇

Try QualityMax

A grounded AI crawl planner, self-healing tests, multi-model routing, and gates that report honest failures instead of faking green.

Get Started Free