Guidance Tracking Scores Every Promise

When we launched Guidance Tracking a year ago, it extracted forward-looking statements and checked metric guidance against reported results. It worked, but two things were missing. Event guidance got captured and then went nowhere, because nothing followed up to see whether the launch shipped or the factory opened. And the same commitment, restated across four quarters, showed up as four separate rows instead of one record.

We rebuilt the feature to fix both. Guidance Tracking now runs on Deep Research Agents and renders as structured tables, where each commitment carries its development history, a source-linked date, and an outcome. The top of the report grades the company on two scores, then lists every commitment with what was guided, what happened, and the verdict.

A Guidance Tracking report scoring a management team on forecasting accuracy and discipline, with each commitment listed alongside what was expected and what actually happened

Here is a full report start to finish, one commitment after another.

A full Guidance Tracking report, from the accuracy and discipline scores down through every scored commitment

One record per commitment, every time it was restated

Start with how guidance is followed over time. A management team guides revenue growth in January, reaffirms it in April, narrows the range in July, and reports the actual in October. Those are four statements about one underlying commitment.

Guidance Tracking now links them into a single record. A commitment is one metric (or one anticipated event) for a given segment, target period, and currency basis, and every restatement of it is linked back to the same record, even when management shifts the date. The table reads as "January said 12-14%, April reaffirmed, July narrowed to 13%, actual was 13.2%" rather than as four disconnected entries.

A metric guidance table where each row carries a history column tracing how the commitment was first guided, then raised, narrowed, or reaffirmed across later filings

A commitment that was reaffirmed and then met reads very differently from one quietly dropped after a single mention. Linking the restatements is what makes that difference legible, and it is usually where the management-quality signal sits.

The linking holds up even when management changes the words. A target called "revenue growth" one quarter and "top-line growth" the next used to slip through as two separate entries. It now lands in the same record, because the match is on what the guidance means, not how it is phrased.

Events get resolved, not just recorded

Metric guidance was always scored against the actual. Event guidance was not. A predicted product launch or regulatory decision was captured and then left unverified, so the data went stale without a conclusion.

Now every event is followed up on a schedule built from its expected date. When the window arrives, the event is labeled occurred, delayed, pending (the window is still open, so it is not yet judgeable and not a miss), or cancelled, checked against company communications and outside reporting, since a launch or a factory opening is often visible without a formal filing.

Metric commitments resolve as beat, met, or missed against the reported result, on the basis management guided to. That check now reads the reported financials directly rather than hunting for numbers in document prose, so a revenue or margin target is compared against the actual figure instead of an interpretation of it.

Two scores per company: accuracy and discipline

Once commitments are linked and resolved, they roll up into two scores per company on a 1 to 5 scale.

Accuracy measures how well realized outcomes matched guidance, weighted by impact and miss magnitude. A small miss on a minor metric barely registers. Repeated large misses on headline metrics like revenue, EPS, or margin, or a cancelled flagship event, drive it down. Pending and unresolvable commitments do not count against it.

Discipline measures how consistent and accountable the guidance practice is. It rewards commitments that were reaffirmed or explicitly updated each period and whose outcomes can be reconciled. It penalizes guidance that was issued once and then dropped, or guidance that can never be checked because the company stopped disclosing the metric.

The two are independent. A company can hit its numbers while managing guidance erratically, or reconcile every commitment while frequently missing. Separating the two is closer to how an analyst actually reads management quality than a single hit rate would be.

Broader sources, faster turnaround

Guidance now draws on the full set of primary content we ingest. Annual reports, interim reports, and transcripts are always in scope, and press releases are included when they carry forward-looking substance rather than boilerplate. The capture also reads the entire document, including the Q&A, and separates items that share a sentence, so two geographies or two products named together become two commitments you can track independently.

It also runs faster and cheaper than the original, which is what makes scoring guidance across a full 40 to 60-name coverage list practical rather than a once-a-quarter chore.

Where to find it

The redesigned Guidance Tracking runs on the company Guidance page through three agents: current state for open commitments, long-term performance for the scored track record, and earnings performance scoped to a single earnings event. You launch any of them the same way you start any Deep Research Agent, and the results land back on the Guidance page when the run finishes.

The three Guidance Tracking agents on a company's Guidance page in the Marvin Labs platform: Guidance Tracker, Guidance Performance Long-Term, and Guidance Performance Earnings, each with a Run Agent button

The same evaluations surface inside Deep Research Agent earnings reviews and are queryable in AI Analyst Chat.

It is live across our coverage universe, and the free Evaluation Plan covers 15 companies with full access. It also feeds the broader Management Quality Assessment workflow, where the scored track record sits alongside sentiment and earnings analysis.

Pick a name you follow, run the long-term performance agent, and see how its management has actually delivered against what it said.